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EXECUTIVE  SUMMARY 


This  report  consists  of  three  self  contained  parts. 
Part  A,  entitled  "Noise  Suppression  and  Signal  Estimation," 
presents  a  summary  of  various  adaptive  noise-cancelli ng 
prefilter  techniques  that  were  investigated.  Part  B,  enti¬ 
tled  "Autoregressive  Spectral  Estimation  in  Noise  for  Speech 
Analysis  Applications,"  presents  a  fundamentally  new 
"weighted  information"  approach  to  the  spectral  estimation 
problem.  Finally,  Part  C  presents  a  summary  of  the  publi¬ 
cation  activities  associated  with  this  research  effort. 

Each  part  of  this  report  is  self-contained  with  sep¬ 
arate  contents,  list  of  figures,  references,  etc.  To  assist 
the  reader,  a  brief  summary  of  parts  A  and  B  along  with  a 
general  table  of  contents  is  provided  below. 

A.  NOISE  SUPPRESSION  AND  SIGNAL  ESTIMATION 

The  goal  of  this  research  effort  was  to  investigate 
adaptive  estimation  methods  for  noise  suppression  and  per¬ 
formance  enhancement  of  narrowband  coding  systems  for  speech 
signals.  In  the  original  proposal  special  reference  was 
made  to  combining  pitch  tracking  adaptive  filters  witn  Lin¬ 
ear  predictive  coding  algori tnms  and  tnese  methods  are  dis¬ 
cussed  in  chapter  II  of  part  A.  The  results  of  thi3  initial 


effort  led  to  the  investigation  of  several  topics  which  are 
briefly  summarized  below. 

In  chapter  il  of  part  A  various  prefiltering  techniques 
for  improving  linear  predictive  coding  systems  are  eval¬ 
uated.  Thi3  includes  the  examination  of  a  ->refilter  con¬ 
sisting  of  an  adaptive  digital  predictor  (ADP)  with  pitch 
period  delay.  Two  adaptive  algorithms,  the  least  mean 
squared  and  the  sequential  regression,  are  evaluateu  for 
ADP.  The  method  proved  successful  in  suppressing  white 
noise  in  voiced  speech  sounds  but  did  not  work  well  when  the 
noise  was  narrowband  such  as  a  single  sine  wave.  This  is 
due  to  the  fact  that  interaction  between  the  pitch  period 
and  the  narrowband  noise  produces  a  bias  error  in  the  adap¬ 
tation  of  the  filter. 

Chapter  III  of  part  A  discusses  a  new  robust  pitch 
estimation  procedure  which  was  developed  as  an  outgrowth  of 
tne  efforts  described  in  chapter  II.  The  performance  of  the 
pi  tch  tracking  filter  depends  on  the  quality  of  the  pitch 
period  estimate  used  to  set  the  input  delay.  It  wa3  found 
that  a  tapped  delay  line  adaptive  digital  filter  provides  a 
robust  pi  ten  estimate.  This  method  allows  for  better  reso¬ 
lution  of  the  pitch  frequency  than  the  traditional  tech¬ 
niques  such  as  autocorrelation  and  harmonic  analysis  and  has 
a  better  noise  tolerance  than  these  technqiues. 

Failure  of  the  pitch  tracking  adaptive  filters  to  sup¬ 
press  narrowband  noise  prompted  the  investigation  of  several 
otner  prefiltering  methods.  The  most  successful  of  the 
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filters  evaluated  was  the  spectral  subtraction  techni iue 
discussed  in  chapter  IV  of  part  A.  Two  modifications  to  the 
original  method  proved  very  useful  for  improving  noisy 
speech.  First  a  dual  time  constant  noise  spectrum  estimate 
improved  wni te  noise  suppression  and  secondly  a  spectral 
notch  feature  greatly  improved  narrowband  noise  quieting. 
Also,  very  successful  was  a  new  filter  method  based  on  adap¬ 
tive  filtering.  These  filters  have  been  investigated  using 
speech  in  the  military  aircraft  environment,  such  as  heli¬ 
copter,  AWACS,  etc.  Informal  listening  tests  indicate  good 
noise  suppression. 

Chapter  V  of  part  A  discusses  two-di mensi onal  filter 
approaches.  The  use  of  a  two-dimensional  filter  approach  to 
suppress  noise  in  the  short-time  Fourier  transform  domain 
found  to  have  a  significant  potential.  From  the  informal 
listening  tests,  the  two  dimensional  processed  speech  sounds 
quieter  with  added  clarity. 

Chapters  VI  and  VII  of  Part  A  are  concerned  wi  tn  fa3t 
algorithms  for  efficient  solution  of  the  linear  estimation 
problem  and  a  new  recursive  linear  estimator  suitable  for 
rapid  estimation  of  a  signal  in  noise.  Efficient  methods 
are  developed  for  optimization  of  the  filter  coefficients. 
Optimal  selection  of  data  to  be  processed  is  shown  to  be 
related  to  a  classic  integer  programming  problem. 


3.  AUT  0REGRE33IVE  SPECTRAL  ESTIMATION  IN  NOISE  IN  THE 
CONTEXT  OP  SPEECH  ANALYSIS 


In  tni s  part  of  the  report,  an  improved  method  of  spec¬ 
tral  estimation  is  described.  The  method  treats  the  problem 
of  estimating  autoregressive  (AR)  process  parameters  from 
sequential  discrete  time  observation  corrupted  by  additive 
independent  noise  with  known  power  spectral  density.  The 
method  has  a  theoretical  foundation  relating  it  to  princi¬ 
ples  of  information  theory  as  well  as  the  linear  predictive 
(LP)  procedures  popularly  employed  for  speech  analysis.  The 
estimation  procedure  enjoys  the  advantages  of  noise  suppres¬ 
sion  filtering  metnods  such  as  those  discussed  in  Part  A. 
Furthermore  the  method  is  able  to  relax  the  usual  LPC  error 
criterion  in  those  spectral  regions  where  the  residual 
speech  distortion  due  to  noise  suppression  is  expected  to  be 
nigh.  The  method  is  very  general  and  applies  both  to  wide¬ 
band  and  narrowband  noise  env j ronments .  The  solution  ob- 


tai ned  by  using  this  method  is  shown  to  be  unique. 

Computational  procedures  appropriate  for  speech  anal¬ 
ysis  applications  are  developed.  The  complexity  of  these 
algorithms  for  speech  applications  is  only  moderately  expen¬ 
sive  when  compared  to  the  present  used  methods.  The  algo¬ 
rithms  have  been  tested  using  simulated  Gaussian  and  speech 
signals  with  known  spectral  characteristics  corrupted  by 
simulated  Gaussian  noise  of  known  power  spectral  density. 
The  results  obtained  by  this  method  are  compared  with  the 
results  obtained  by  currently  popular  methods  of  AR  spectral 


TABLE  OF  CONTENTS 


Part  A:  NOISE  SUPPRESSION  AND  SIGNAL  ESTIMATION  . 


Table  of  Contents 
List  of  Figures  . 


Chapter 


I. 

II. 

III. 

IV. 


VI. 

VII. 

VIII. 
Cited 


Introduction  . 

Prefiltering  with  Adaptive  Digital 

Predictor  . 

Pitch  Estimation  by  Adaptive  Filtering 
Noise  Suppression  Methods  for  Speech 

Applications  . 

Enhancement  of  Speech  Signals  by  Two 
Dimensional  Signal  Processing  .  .  .  . 

Discrete  Time  Estimation  . 

Design  of  Fast  Recursive  Estimators 

Conclusions  . 

References . . . 


Part  B.  AUTOREGRESSIVE  SPECTRAL  ESTIMATION  IN 

NOISE  FOR  SPEECH  ANALYSIS  APPLICATIONS  . 

Table  of  Contents  . 

List  of  Figures . 

Chapter 

I.  Introduction  . 

II.  General  Discussion  . 

III.  Theoretical  Formulation  . 

IV.  Computational  Formulation  . 

V.  Results  . 

VI.  Conclusions  . 

Cited  References  . 


Part  C.  SUMMARY  OF  PUBLICAT  ION  ACT IV LUES  .  .  .  . 
PUBLICATION  ACTIVITIES  . 


vii 


} 


1 »  '  ■ '  *»■  %  • 


»*«  *’  .  •  *  •  *  »  ‘  •  *  »  '  «  *  •  * 


TABLE  OF  CONTENTS 


Page 

Chapter 


I.  INTRODUCTION  .  6 

f  ■  Noise  Suppression .  6 

Fast  Recursive  Estimators  . .  9 

i  II.  PREFILTERING  WITH  AN  ADAPTIVE  DIGITAL 

;  PREDICTOR .  10 

Filter  Configurations .  11 

Data  Generation .  12 

jj  Pitch  Estimate  Error .  13 

P  Narrowband  Noise  .  17 

Conclusion .  17 

!;  III.  PITCH  ESTIMATION  BY  ADAPTIVE  FILTERING  ....  20 

(The  General  Time  Delay  Case .  21 

The  Pitch  Estimation  Configuration  .  22 

The  Adaptive  Configuration  .  24 

Visible  Detail  .  25 

Pitch  Period  Resolution  .  27 

Noise  Tolerance .  27 

I  Computational  Considerations  .  27 

F  Possible  Applications  .  28 

IV.  NOISE  SUPPRESSION  METHODS  FOR  SPEECH 

APPLICATIONS  .  31 

^  Methods  Investigated . 31 

*  General  Spectral  Subtraction  .  32 

f  The  Noise  PSD  Estimate  . . 33 

r  Frequency  Domain  Notch  Filter  .  34 

Filter  Design  by  Inverse  Transform .  34 

’■  Filter  Design  by  Notch  Placement .  35 

t  Filter  Design  by  Adaptive  Filter  .  37 

Comparison  of  Filter  Methods  .  39 

I-  Objective  Comparisons .  40 

Subjective  Comparisons  .  41 

Conclusions .  4-2 


Page 


V.  ENHANCEMENT  OP  SPEECH  SIGNALS  BY  TWO 
DIMENSIONAL  SIGNAL  PROCESSING  .  .  . 


Two  Dimensional  Representation  . 

Filtering  in  the  Double  Transformed  Domain  .  . 
Conclusions . . . 


DISCRETE  TIME  ESTIMATION 


Set  Up  of  Recursion  Equations  . 

Structure  of  B.  Bp . . 

Special  Structure^of  Equation  (6.11)  .  . 

Intermediate  Steps  . 

Pinal  Solutions  . 


DESIGN  OP  PAST  RECURSIVE  ESTIMATORS 


Problem  Statement  .  , 

Parameter  Optimization  .... 
Optimizing  the  Selection  Vector 

The  Design  Algorithm  . 

Efficient  Matrix  Inversion  .  .  , 

An  Example . . 

Discussion  .  , 


VIII.  CONCLUSIONS 


Suggestions  for  Further  Study 


CITED  REFERENCES 


.**•  **  .** 


'  j*  "  •>  *"„i*  *  •  *  * 


W  W\>  \ 

.  _**  s»  , 


LIST  OF  FIGURES 


Figure 

1. 

2. 

3- 

4- 

5. 

6. 

7. 

8. 

9- 

10. 

11. 

12. 

13- 

14. 

15. 

16. 


Page 


ADP  Configuration .  12 

Test  Signal-time  and  frequency  plot .  13 

Noise  Signal  Spectra .  14 

Pitch  Error  Sensitivities,  LMS  algorithm, 

1-3  dB  SNR  input,  four  and  eight  weight 

filters .  16 

Pitch  Error  Sensitivity,  LMS  algorithm, 

1.3  dB  SNR,  eight  weight  filter .  16 

ADP  Performance,  LMS  algorithm, 

eight  weight  filter .  18 

ADP  Performance,  SER  algorithm,  eight 

weight  filter  .  18 

General  Configuration  for  a  time  delay 

TDLADF .  23 

TDLADF  for  pitch  period  estimation  .  23 

TDLADF  weight  plot  for  the  sentence, 

"they  shook  hands  for  good  luck" .  26 

Expanded  weight  plot  for  the  start  of 

the  word  "they" .  26 

Expanded  weight  plot  using  SER  adaptive 
algorithm .  26 

Pitch  frequency  error  for  a  TDLADF  on 

the  synthetic  vowel  /a/  corrupted  by 

White  Noise .  29 

Two  stage  TDLADF  for  pitch  estimation  ....  29 

Adaptive  predictor  for  speech  filtering  ...  37 

Adaptive  Transfer  Filter  .  39 


U 


SNR  performance  . 

Log  area  ratio  performance  . 

Block  diagram  for  a  two-dimensional 
speech  filter  . 

Clean  speech  spectrogram  and  two-dimensional 
transform  . 

Noisy  speech  spectrogram  and  two-dimensional 
transform  . 

Results  at  each  step  of  a  two-dimensional 
filter  process  . 


CHAPTER  I 


INTRODUCTION 

Digital  encoding  of  speech  signals  has  proven  to  be  an 
effective  means  of  bandwidth  compression.  Expanding  use  of 
narrowband  digital  encoding  systems  has  revealed  some  disap¬ 
pointing  limitations  in  their  performance.  One  limitation 
is  degredation  of  intelligibility  in  the  presence  of  back¬ 
ground  noise  [1,2].  This  difficulty  has  drawn  Inore  atten¬ 
tion  as  speech  encoding  systems  have  attempted  to  move  into 
military  applications .  These  applications  may  include  the 
environments  such  as  airborne  command  posts,  cockpits  of  jet 
aircraft,  helicopters  and  many  others.  The  goal  of  this 
study  was  to  investigate  several  methods  of  improving  the 
performance  of  narrowband  coding  systems  in  the  presence  of 
acoustically  coupled  background  noise.  To  complement  this, 
an  additional  goal  was  to  investigate  fast  algorithms  to 
implement  these  algorithms  along  with  descrete-time  estima¬ 
tion  algorithms. 

Noise  Suppression 

The  problem  of  noise  in  speech  encoders  can  be 
addressed  as  a  simple  noise  filter  problem  by  placing  a 
simple  filter  in  front  of  the  encoder.  The  problem  with 


this  approach  is  that  most  background  noise  of  interest  is 
not  stationary  and  cannot  be  predicted  ahead  of  time.  Noise 
filters  with  adaptive  structures  have  shown  some  promise  as 
prefilters  for  speech  encoding  systems  [3].  Several  of  the 
adaptive  filter  techniques  were  investigated  in  this 
study.  Chapter  II  describes  an  adaptive  filter  which  takes 
advantage  of  pitch  information  to  reduce  background  noise. 
Two  adaptive  algorithms,  the  Least  Mean  Square  Algorithm 
(LMS)  [4]  and  the  Sequential  Regression  Algorithm  (SER)  L5j, 
were  evaluated  for  the  adaptive  digital  predictor. 
Performance  was  evaluated  using  signal  to  noise  ratio 
measurements . 

It  has  been  recognized  that  pitch  estimation  is  not 
simple  if  the  speech  is  corrupted  by  noise  [6].  The  cur¬ 
rently  popular  pitch  extraction  methods  are  generally  of  two 
types,  autocorrelation  analysis  and  harmonic  analysis.  The 
autocorrelation  method  performs  an  autocorrelation  on  the 
windowed  speech  data  and  the  pitch  period  for  the  voiced 
speech  is  estimated  by  using  the  peaks  in  the 
autocorrelation  function.  On  the  other  hand,  the  pitch 
extraction  methods  are  based  on  performing  a  discrete 
Fourier  transform  on  the  windowed  speech  data  [7].  However, 
these  methods  do  not  fare  well  for  noi3y  speech.  Chapter 
III  presents  a  method  for  determining  the  pitch  period  of  a 
voiced  speech  signal  using  a  tapped  delay  line  adaptive 
digital  filter  (TDLADF).  These  filters  have  been  widely 
used  in  sonar  applications  i_  3—9  J  *  This  approach  uses  the 
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TDLADF  to  estimate  the  time  delay  between  two  part3  of  the 
same  signal.  The  time  delay  then  corresponds  to  the  pitch 
of  the  signal.  This  time  delay  is  estimated  by  processing 
the  weights  of  the  TDLADF. 

Several  prefiltering  methods  based  upon  adaptive  tech¬ 
niques  are  discussed  in  Chapter  IV.  One  popular  method  of 
prefiltering  speech  involves  direct  modification  of  the 
short-time  Fourier  transform  (STFT)  of  a  speech  signal  typ¬ 
ically  called  spectral  subtraction  [lO].  This  method  esti¬ 
mates  the  STFT  of  the  background  noi3e  and  then  subtracts 
this  from  the  STFT  of  the  speech  plus  background  noise.  The 
work  described  in  Chapter  IV  is  limited  to  filtering  tech¬ 
niques  for  suppressing  narrowband  background  noise  in  speech 
signals.  The  methods  included  are  a  modified  spectral  sub¬ 
traction  technique,  an  inverse  transform  filter,  an  adaptive 
notch  placement  technique  and  an  adaptive  filter 
technique.  These  methods  are  evaluated  using  signal  to 
noise  ratio  measurements  and  log  area  ratio  performance 
measurements . 

Chapter  V  presents  some  interesting  image  processing 
techniques  to  enhance  speech.  In  this  approach,  the  STFT 
representation  of  a  segment  of  speech  in  treated  as  a  two- 
demensional  image  data  [ll].  It  has  been  shown  that  by  the 
use  of  image  processing  operations,  such  as  contrast 
enhancement  and  smoothing,  background  noise  can  be  sup¬ 
pressed  and  the  speech  signal  can  be  enhanced. 
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Past  Recursive  Estimators 


Real  time  implementation  of  the  before  mentioned  filter 
techniques  requires  considerable  computational  power.  It  is 
important  that  all  the  alogorithms  involving  in  the  encoding 
process  be  as  efficient  as  possible.  During  the  course  of 
this  study,  fast  algorithms  have  been  developed  for  linear 
estimators  for  rapid  estimation  of  signal  in  noise.  Chapter 
VI  presents  a  simple  and  an  efficient  algorithm  for  the 
solution  of  a  generalized  least  squares  estimation  prob¬ 
lem.  Chapter  VII  presents  a  recursive  linear  estimator  for 
rapid  estimation  of  a  signal  in  noise.  Efficient  methods 
are  developed  for  optimization  of  the  filter  coefficients. 
Optimal  selection  of  data  to  be  precessed  i3  shown  to  be 
related  to  a  classic  integer  programming  problem  [12-13]. 

Finally,  Chapter  VIII  summarizes  the  results  and 
provides  suggestions  for  further  study. 


CHAPTER  II 


PREFILTERING  WITH  AN  ADAPTIVE  DIGITAL  PREDICTOR 

Although  linear  preditive  coding  (LPC)  techniques  used 
to  Encode  narrowband  speech  signals  are  efficient  at  data 
compression  [6,14],  they  degrade  significantly  if  the  speech 
is  corrupted  with  noise  [l,2,15]*  Frequency  domain  filter¬ 
ing  techniques  using  pitch  period  information  have  been 
evaluated  and  found  to  be  of  limited  usefulness  [16].  The 
objective  of  this  Chapter  is  to  examine  the  performance  of  a 
time  domain  filtering  technique  using  an  estimate  of  the 
pitch  period  and  an  adaptive  digital  predictor  (ADP),  Figure 
1,  to  reduce  the  noise  to  speech  signals  [3]« 

The  noise  filtering  properties  of  the  ADP  with  pitch 
period  delay  are  examined.  The  primary  areas  discussed  here 
are  the  ADP's  performance  with  various  noise  types.  Per¬ 
formance  will  be  evaluated  using  signal-to-noise  ratio  (SNR) 
measurements.  This  performance  measure  is  used  for  conven¬ 
ience  in  the  parameter  sensitivity  investigations  although 
it  is  recognized  that  intelligibility  is  not  always  a  func¬ 
tion  of  objective  measures  of  speech  quality. 

Two  adaptive  algorithms,  the  LMS  [4]  and  the  SER  [5], 
were  evaluated  for  the  ADP.  These  algorithms  are  discussed 
in  the  next  section.  Synthetic  speech  was  used  in  all 


evaluations.  The  section  on  Data  Generation  discusses  how 
this  speech  was  generated.  The  effect  of  pitch  estimate 
errors  on  performance  is  dealt  with  in  the  Pitch  Estimate 
Error  section.  The  ADP's  performance  for  various  types  of 
noise  is  the  subject  of  the  Narrowband  Noise  section. 


Filter  Configurations 


Two  adaptive  algorithms  will  be  considered,  the  LMS 
algorithm  [4]  and  the  3ER  algorithm  [5]«  The  LMS  algorithm 
i3  a  suboptimal  least  squares  approach  derived  using  tne 
method  of  steepest  descent.  The  3ER  algorithm  is  optimum  in 
the  least  squares  sense  and  depends  upon  the  matrix 
inversion  lemma  to  compute  the  inverse  of  the  auto¬ 


correlation  matrix  at  each  iteration.  If 
f£  =  f(k-A)  f(k-l-A)  ...  F(k-M-l-A) 


(2.1) 


i3  the  input  vector  to  the  adaptive  filter  of  length  M,  then 
the  LMS  algorithm  [4]  is  given  by 

Ar  =  Ar-1  +  vFr  e(r) ,  (2.2) 


where  Ar  is  the  vector  of  filter  coefficients, 
e(r)  =  f(r)  -  F^Ar_1 


(2.3) 


as  shown  in  Figure  1,  and  v  is  a  convergence  parameter.  The 


3ER  algorithm  [5]  is  given  by 

Ar  '  Ar-1  +  6r  Fr  *<r) 

where  e(r)  is  defined  by  (2.3)  with 

Qr.  =  Qr._i  -  (1/6)  Q„_i  F_  F?  3 


(2.4) 


(2.5) 


Pitch  EitiMti 


Figure  1.  ADP  Configuration 

where  Ar  is  again  the  coefficient  vector,  Fr  is  given  by 
(2.1)  and  Qr  is  the  inverse  of  the  autocorrelation  matrix  of 
the  input  to  the  ADP.  Note  that  (2.5)  and  (2.6)  update  Qr 
through  the  matrix  inverse  lemma. 

Data  generation 

In  these  simulations  the  test  data  was  limited  to  a 
single  vowel  sound  /a/.  This  was  done  in  order  that  the 
dynamics  of  the  adaptive  filter  itself  may  be  observed  with¬ 
out  the  added  complexity  of  interaction  of  two  phonemes.  To 
generate  the  vowel  sound  used  for  the  test,  natural  speech, 
sampled  at  8000  Hz,  was  first  analyzed  by  linear  prediction 
analysis  (LPA).  An  eighth  order  model  was  used.  The 
resulting  LPA  parameters  were  then  used  to  synthesize  the 
test  waveform.  In  this  way  the  pitch  period  and  spectral 
characteristics  of  the  test  signal  were  completely  known. 
Figure  2  shows  the  waveform  and  spectrum  of  the  test  signal. 
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Figure  2.  Teat  signal  time 

and  frequency  plot. 

Noise  signals  for  the  test  were  chosen  to  be  simple  but 
representative  of  the  general  types  encountered.  The  three 
noise  signals  used  were  wideband,  narrowband  and  a  single 
sinewave.  The  wideband  noise  was  generated  by  a  Gaussian 
psuedo-randora  number  generator.  For  the  narrowband  noise, 
the  wideband  generator  output  was  passed  through  a  digital 
resonator  with  center  frequency  at  2664  Hz  and  bandwidth  of 
400  Hz.  The  sinewave  noise  consisted  of  a  single  sinewave 
of  constant  amplitude  at  a  frequency  of  2664  Hz.  Figure  3 
shows  the  spectra  of  the  noise  signals  used. 


Pitch  Estimate  Error 

An  essential  part  of  the  filtering  technique  is  the 
estimate  of  the  pitch  period.  It  would  be  useful  to  know 
just  how  accurate  the  estimate  must  be  to  allow  the  filter 
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to  perform  satisfactorily.  Figure  4  shows  the  results  of  a 
computer  simulation  of  the  ADP  in  which  the  pitch  estimate 
given  to  the  ADP  i3  varied  from  tne  actual  pitch  period  of 
the  speech  signal.  The  input  signal  was  corrupted  to  1.3  aS 
SNR  at  the  input  of  the  filter  with  the  wideband  noise  spec¬ 
ified  in  the  previous  section.  The  SNR  of  the  output  is 
plotted  against  the  pitch  estimate  (in  data  sample  periods) 
used  by  the  ADP.  Curves  are  shown  for  four  and  eight  weight 
ADP's.  One  should  note  that  the  filter  performs  properly 
only  when  the  estimated  pitch  falls  within  a  window  near  the 
actual  pitch  period.  The  width  of  this  window  is  approx¬ 
imately  equal  to  the  number  of  weights  in  the  filter.  The 
plot  in  Figure  5  extends  from  pitch  estimates  +60  to  -60.  A 
minus  pitch  estimate  means  that  the  delay  is  in  the  ref¬ 
erence  input  side  of  the  adaptive  filter,  3ee  Figure  1, 
instead  of  the  filter  side.  One  should  note  that  there  are 
two  more  acceptance  windows  located  at  zero  and  -50.  Only 
the  windows  located  at  +50  and  -50  give  improvement  in  SNR 
over  the  unfiltered  signal.  Note  also  that  total  signal 
improvement  is  higher  if  the  pitch  estimate  is  slightly 
lower  than  the  actual  signal  pitch.  These  results  imply 
that  the  pitch  estimate  need  not  be  perfect  but  must  fall  in 
a  window  bounded  on  the  high  side  by  the  actual  pi  ten  and 
with  width  approximately  that  of  the  filter  weights.  Tne 
results  in  Figure  4  and  5  are  for  the  LMS  algorithm.  The 
SER  algorithm  shows  the  same  window  effect.  The  window 
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width  was  also  found  to  be  relatively  insensitive  to  the 
type  and  intensity  of  the  noise  used  to  corrupt  the  test 
signal . 

Narrowband  Noise 

Using  real  speech,  Sambur  has  shown  that  this  ADP  con¬ 
figuration  provides  improvement  in  SNR  in  the  case  of  wide¬ 
band  noise  [  3  ]  •  More  challenging  though  is  the  case  of 

narrowband  noise  such  as  is  found  in  many  industral  and 
military  environments.  Figure  6  shows  the  overall 

performance  of  the  filter  with  the  LMS  algorithm  for  the 
single  vowel  sound  /a/  corrupted  by  varying  intensities  of 
wideband,  narrowband  and  single  sinewave  noise.  SNR  of  the 
output  is  plotted  against  the  SNR  of  the  input.  Figure  7 
shows  the  same  results  for  the  SER  algorithm.  In  each  case 
the  filter  was  allowed  to  run  to  convergence  before  actual 
SNR  calculations  were  made.  One  should  note  that  in  all  but 
the  sinewave  noise  case,  some  improvement  in  SNR  was 

realized.  Also  note  that  the  sub-optimal  LMS  algorithm 
provided  more  improvement  in  SNR  than  the  SER  algorithm. 

Conclusion 

A  technique  using  an  adaptive  digital  predictor  with 
pitch  period  delay  to  reduce  noise  in  speech  signals  has 
been  examined.  It  has  been  shown  for  the  case  of  a  simple 
vowel  sound  that  wideband  and  narrowband  noise  can  be 
reduced.  However,  single  sinewave  interference  was  not 
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CHAPTER  III 


PITCH  ESTIMATION  BY  ADAPTIVE  FILTERING 

Estimation  of  the  fundamental  frequency  or  "pitch"  of  a 
voiced  speech  signal  is  one  of  the  basic  steps  in  most 
speech  analysis  and  speech  encoding  systems  [ 1 7 J -  The  cur¬ 
rently  popular  pitch  extraction  methods  are  generally  of  two 
types,  autocorrelation  analysis  and  harmonic  analysis.  The 
autocorrelation  technique  performs  an  autocorrelation  on  the 
windowed  speech  data  [6].  If  the  windowed  data  contains 
several  pitch  periods,  the  resulting  autocorrelation 
function  will  have  a  peak  at  the  delay  corresponding  to  the 
pitch  period.  Rather  than  work  with  the  raw  speech  data, 
most  schemes  will  lowpass  filter  or  center  clip  or  both 
before  performing  the  autocorrelation.  This  can  enhance  the 
result  and  lower  the  computation  required.  In  some  schemes 
the  input  to  the  autocorrelation  process  is  the  prediction 
residual  from  an  appropriate  linear  prediction  algorithm. 
In  any  case,  the  pitch  is  estimated  by  identifying  the  ap¬ 
propriate  peak  in  the  autocorrelation  function. 

Pitch  extraction  methods  based  on  harmonic  analysis 
usually  start  by  performing  a  discrete  Fourier  transform  on 
the  windowed  speech  data  [7j.  The  transformed  data  is  ex¬ 
amined  to  locate  the  line  spectra  features  that  are  charac¬ 
teristic  of  pitch  periodic  time  signals.  Spacing  of  the 


line  spectra  features  can  then  be  used  to  estimate  the  pitch 
frequency. 

This  chapter  presents  a  method  for  determining  the 
fundamental  frequency  of  a  speech  signal  using  a  tapped 
delay  line  adaptive  digital  filter  (TDLADP).  Like  the 
autocorrelation  method,  thi3  technique  tries  to  measure  the 
time  delay  between  successive  pitch  period  waveforms.  The 
use  of  TDLADP  to  estimate  the  time  delay  between  two  signals 
has  been  well  developed  for  use  in  sonar  applications 
[8,  9]«  Instead  of  estimating  the  delay  between  two 
different  signals,  this  application  uses  the  TDLADP  to 
estimate  the  time  delay  between  two  parts  of  the  same 
signal.  The  weights  of  the  TDLADP  are  processed  to 
determine  the  pitch  estimate. 

The  General  Time  Delay  Case 

Figure  8  shows  the  general  structure  of  a  TDLADF  with 
its  filter  input  and  reference  input.  If  a  signal  is  ap¬ 
plied  to  the  filter  input  and  a  delayed  version  of  the  same 
signal  is  applied  to  the  reference  input,  the  adaptive  algo¬ 
rithm  will  minimize  the  error  signal  by  adjusting  the 
weights  of  the  tapped  delay  line  filter  to  approximate  the 
unknown  delay  in  the  reference  input.  The  resulting  con¬ 
verged  filter  will  then  have  a  weight  with  a  value  of  one  at 
the  delay  line  tap  that  correponds  to  the  unknown  delay  and 
zeros  in  all  the  other  weights.  In  most  cases  it  is  not 
necessary  to  wait  until  the  adaptive  algorithm  converges  to 


determine  the  time  delay  [ S j .  After  only  a  few  iteration 
steps,  a  scan  of  the  weight  values  for  the  maximum  positive 
value  gives  a  very  close  estimate  to  the  actual  time  delay. 

The  Pitch  Estimation  Configuration 

Figure  9  shows  the  TDLADF  configured  for  pitch  period 
estimation.  The  filter  input  delay  is  established  by  the 
shortest  pitch  period  expected  (approximately  3-5  ms).  The 
length  of  the  tapped  delay  line  filter  is  then  determined  by 
the  longest  pitch  period  expected  (usually  15-20  ms).  For 
example,  an  expected  pitch  period  range  from  5  ms  to  1 5  ms 
would  require  a  delay  of  50  samples  and  filter  length  of  100 
samples  if  the  sample  rate  was  10kHz.  In  the  configuration 
of  Figure  9,  the  TDLADF  approximates  the  delay  between  two 
successive  pitch  periods.  Since  successive  pitch  waveforms 
of  natural  speech  are  not  exactly  alike,  the  weights  will 
never  converge  to  a  single  impulse.  Still,  a  clear  peak  in 
the  weight  function  will  be  evident. 

Figure  10  gives  a  plot  of  the  TDLADF  weights  for  the 
utterance  "They  shook  hands  for  good  luck."  The  seech  data 
for  this  example  was  sampled  at  8kHz.  The  example  used  a  40 
weight  adaptive  filter  with  a  40  sample  delay  on  the  input. 
In  this  plot,  the  relative  darkness  of  any  point  shows  the 
value  of  a  particular  weight  at  that  point  in  time.  The 
more  positive  the  value  of  the  weight,  the  darker  it  appears 
on  the  plot.  Since  the  presence  of  a  pitch  periodic  signal 
will  be  indicated  by  a  positive  peak  in  the  weights,  only 
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the  positive  values  of  weights  are  shown.  The  changes  in 
the  pitch  period  are  clearly  visible  as  the  peak  in  the 
weight  values  tracks  the  delay  between  successive  pitch 
periods.  Note  that  a  singular  peak  is  not  discernible 
during  the  unvoiced  "sh"  of  the  word  shook. 

The  Adaptive  Algorithm 

The  adaptive  algorithm  used  for  the  example  of  Figure 
10  is  based  on  Widrow's  LMS  adaptive  algorithm  [4].  In  this 
algorithm  the  weights  of  the  tapped  delay  line  are  updated 
by  the  relation 


W(n+1)  =  W(n)  +  u  e(n)  F(n) 

where  W(n)  is  the  vector  of  weights,  u  is  the  convergence 
parameter,  F(n)  is  the  input  signal  vector,  and  e(n)  is  the 
error  at  step  n.  The  LMS  algorithm  is  one  of  the  slower 
converging  adaptive  algorithms.  This  is  not  a  problem  in 
this  application  since  complete  convergence  is  neither 
required  nor  wanted  for  proper  pitch  estimation.  It  has 
been  found  that  the  weights  in  the  LMS  algorithm  can  more 
quickly  track  a  varying  pitch  period  if  the  algorithm  is 
prevented  from  converging  completely.  This  is  done  by 
adding  a  relaxation  parameter  to  the  weight  update  equation 
£  1 9 j •  The  modified  update  equation  is 

W( n+1 )  =  vW(n)  +  ue(n)  F(n) 
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where  v  is  the  relaxation  parameter.  Suitable  values  of  v 
were  found  to  be  in  the  range  of  .90  to  .99*  Other  adaptive 
algorithms  may  be  suitable  for  this  application.  This  study 
in  concerned  with  the  LMS  algorithm. 

Visible  Detail 

Since  the  TDLADP  generates  an  estimate  of  the  pitch  at 
each  incoming  data  sample,  it  can  reveal  subtle  variations 
in  the  pitch  period  that  would  go  unnoticed  with  other  pitch 
estimation  schemes.  Both  the  autocorrelation  and  harmonic 
analysis  techniques  must  work  with  data  windows  that  are 
several  pitch  periods  wide  to  get  good  results.  Subtle 
variations  in  the  pitch  are  averaged  out  in  the  process. 

Figure  1 1  shows  an  expanded  plot  of  the  start  of  the 
word  "they”  from  the  example  in  Figure  10.  The  time  plot  of 
the  actual  signal  is  included  with  the  TDLADF  weight  plot. 
Small  changes  in  the  pitch  period  from  one  pitch  period  to 
the  next  are  evident  in  the  weight  plot.  Note  the  jump  in 
the  pitch  period  associated  with  the  change  in  voicing. 
Even  more  subtle  features  of  the  pitch  period  variations  may 
be  observed  if  an  adaptive  algorithm  with  faster  convergence 
characteristic  is  used.  Figure  12  shows  the  same  example 
signal  of  Figure  11  processed  by  a  sequential  regression 
(SER)  adaptive  algorithm  [20].  The  SER  algorithm  is 
computationally  intensive  but  does  converge  quickly  and, 
therefore,  more  detail  can  be  seen  in  the  weight  plot. 


Pitch  Period  Resolution 

Normally,  the  time  resolution  of  the  pitch  period  esti¬ 
mate  would  be  limited  to  the  data  sampling  interval.  The 
estimate  can  be  improved  by  using  a  parabolic  interpolation 
over  several  weight  values  in  the  vicinity  of  the  peak 
weight . 

Noise  Tolerance 

The  original  motiviation  for  using  an  adaptive  filter 
for  pitch  detection  occurred  while  studying  the  effect  of 
background  noise  on  low  bit  rate  speech  coding  systems.  The 
adaptive  technique  was  found  to  be  very  robust  in  the 
presence  of  various  kinds  of  noise.  Rigorous  comparisons 
with  other  methods  have  yet  to  be  made.  Figure  13  gives  the 
results  of  several  tests  that  calculated  the  average  pitch 
estimate  error  for  various  input  signal  to  noise  ratios.  It 
has  been  found  that  the  slow  convergence  rate  of  the  LMS 
adaptive  algorithm  automatically  provides  much  of  the 
smoothing  needed  by  other  methods  in  the  presence  of  noise 
[21]. 

Computational  Considerations 

From  a  computational  standpoint,  the  TDLADF  for  pitch 
detection  is  very  costly.  The  LMS  algorithm  requires 
approximately  200  multiplies  per  data  sample.  Since  most  of 
the  pitch  information  is  contained  in  the  lower  1000  Hz  of 
the  spe°ch  signal,  the  data  can  be  lowpass  filtered  and  then 
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down  sampled.  The  number  of  computations  is  cut  by  about 
the  down  sampling  rate.  Down  sampling  for  pitch  detection 
is  also  common  among  autocorrelation  methods. 

One  side  effect  of  the  down  sampling  is  reduced  pitch 
period  resolution.  Interpolation  will  help,  but  the  TDLADP 
technique  offers  another  alternative.  Figure  14  shows  a 
block  diagram  for  a  two  stage  adaptive  pitch  estimator.  The 
"course"  stage  uses  the  lowpass  filtered,  down  sampled 
version  of  the  signal  to  obtain  an  approximate  value  of  the 
pitch  period.  This  estimate  is  then  used  to  adjust  the 
delay  on  the  second  "fine"  stage.  The  second  stage  TDLADF, 
though  not  down  sampled,  need  only  be  wide  enough  to  accom¬ 
modate  the  expected  error  in  the  first  estimate.  Processing 
the  weights  in  the  "fine"  stage  gives  the  same  resolution  as 
the  original  sampled  case  but  at  lower  computational  cost. 
Initial  test  with  this  structure  showed  that  the  compu¬ 
tational  savings  were  bought  at  the  cost  of  slightly  lower 
noise  tolerance. 

Possible  Applications 

Most  interest  in  pitch  estimation  techniques  centers 
around  the  real  time  encoding  of  speech  data.  The  TDLADF 
does  not  seem  to  be  well  suited  to  this  application  due  to 
the  computational  requirements.  Recent  development  of  VLSI 
circuits  to  implement  the  adaptive  filter  algorithms 
directly  in  hardware  may  eventually  do  away  with  that  limi¬ 


tation. 


Other  applications  for  pitch  estimation  are  in  disease 
diagnosis.  Detailed  study  of  the  speech  waveforms  of  some 
patients  can  aid  in  the  detection  of  certain  disorders  of 
the  vocal  tract  and  nervous  system  [22].  The  ability  of  the 
TDLADP  to  display  subtle  features  in  the  pitch  period  may 
prove  useful  for  such  diagnosis.  The  computational  load 
should  not  present  a  problem  in  this  application  since  the 
analysis  need  not  be  done  in  real  time. 
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Chapter  IV 

NOISE  SUPPRESSION  METHODS 
FOR  SPEECH  APPLICATIONS 

Use  of  linear  predictive  coding  and  other  speech  encod¬ 
ing  techniques  in  military  environments  has  revealed  some 
disappointing  limitations  in  the  narrowband  digital  encoding 
of  speech.  Moderate  acoustic  background  noise  can  severely 
degrade  the  overall  system  performance  [2].  This  Chapter 
investigates  both  frequency  and  time  domain  noise  suppres¬ 
sion  techniques  that  appear  to  be  effective  on  background 
noise  with  non-stationary  narrowband  characteristics. 

Methods  Investigated 

Subtraction  of  an  estimated  noise  spectrum  in  the 
frequency  domain  has  been  shown  to  be  very  effective  on 
narrowband  interference  signals  [23,24].  Two  modif icat i  ons 
to  the  original  technique  were  investigated  as  a  part  of 
thia  study.  First,  placing  zeros  in  suspected  large  inter¬ 
ference  bands  was  found  to  improve  the  subjective  quietness 
of  the  speech.  Secondly,  it  was  also  found  that  a  dual  time 
constant  spectral  averager  allows  for  better  noise  spectrum 
estimation. 

Time  domain  filters,  designed  to  track  and  reject  sus¬ 
pected  narrowband  noise  signals,  have  the  potential  of  being 
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more  computationally  efficient  than  frequency  transform 
methods.  Four  methods  for  designing  time  domain  filters 
were  investigated.  First,  a  time  domain  filter  may  be 
formed  from  the  proper  inverse  transform  of  the  inverse  of 
the  estimated  noise  spectrum  [25 J.  Another  technique 
searches  the  noise  spectrum  for  the  largest  suspected  inter¬ 
ference  signal  and  then  designs  a  time  domain  notch  filter 
with  appropriate  center  frequency  and  bandwidth.  The  third 
technique  is  an  adaptive  predictor  [18]  which  adaptively 
designs  a  time  domain  noise  suppression  filter  based  on  a 
sample  input  of  the  background  noise.  The  fourth  is  a  mod¬ 
ification  of  the  adaptive  predictor  to  allow  for  flatter 
passbands. 

General  Spectral  Subtraction 

The  spectral  subtraction  noise  reduction  method  is  the 
process  of  subtracting  an  estimate  of  the  noise  power  spec¬ 
tral  density  (PDS)  from  the  corrupted  input  signal's  PSD  in 
an  attempt  to  improve  the  signal-to-noise  ratio  (SNR).  This 
subtraction  is  adaptive  in  that  the  estimate  of  the  noise 
PDS  is  updated  during  the  absence  of  speech.  The  algorithm 
involves:  1)  breaking  the  input  signal  into  frames  and 
estimating  the  PSD  of  each  frame  by  an  FFT,  2)  updating  the 
estimate  of  the  noise  PSD  if  no  speech  is  present,  5)  sub¬ 
tracting  the  current  noise  PSD  from  the  signal  PSD  and  4) 
transforming  the  frequency  domain  result  back  into  the  time 
domain  using  the  input  signal's  original  phase  information. 
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The  Noise  PSD  Estimate 


The  performance  of  the  spectral  subtraction  technique 
depends  a  great  deal  on  the  accuracy  of  the  estimation  of 
the  noise  PSD.  Errors  in  the  noise  PSD  estimate  usually 
show  up  as  tone  burst  or  spectral  artifacts  in  the  output 
signal.  If  the  noise  signal  is  assumed  to  be  stationary  on 
a  short  time  basis,  then  some  form  of  smoothing  of  the  frame 
to  frame  variations  will  improve  the  noise  PDS  estimation. 
A  first  order  recursive  smoother  was  used  to  average  each 
descrete  frequency  power  estimate  over  several  frames.  The 
recursive  smoother  computes  the  running  average  of  each 
discrete  frequency  by 

En(k)  =  Pn(k)  +  c(En.1(k)  -  Pn(k)) 
where  En  is  the  recrusive  estimate  of  the  PSD  based  upon  Pn, 
the  noise  PSD  in  the  nth  frame,  c  is  a  constant  which  sets 
the  effective  time  constant  for  the  average,  and  k  is  the 
decrete  frequency  index.  No  smoothing  was  done  between 
adjacent  discrete  frequencies.  The  appropriate  time  con¬ 
stant  for  the  smoother  depends  on  the  frame  to  frame  varia¬ 
bility  of  the  noise.  A  fixed  time  constant  of  about  5 
frames  worked  reasonably  well.  However,  a  dual  time  con¬ 
stant  scheme  based  upon  adjacent  power  estimates  proved 
better.  If  the  new  power  estimate  of  a  particular  deicrete 
frequency  is  greater  than  the  last  estimate  then  a  shorter 
time  constant  (1-2  frames)  is  used.  If  the  new  estimate  is 
less  than  the  current  estimate  then  a  long  time  constant  (4- 
5  frames)  is  applied.  This  "fast  attack  —  slow  decay" 


scheme  reduced  the  number  of  spectral  artifacts  due  to  ran¬ 
dom  variations  in  the  noise. 


Frequency  Domain  Notch  Filter 

The  difficulty  with  very  high  intensity  narrowband 
noise  is  that  the  spectral  artifacts  due  to  the  noise  PSD 
estimation  errors  can  be  very  large  in  the  vicinity  of  the 
noise  concentration.  Smoothing  is  impractical  to  suppress 
these  large  artifacts.  It  was  found,  however,  that  total  el¬ 
imination  of  the  speech  PSD  in  the  area  of  the  large  narrow- 
band  noise  reduced  the  number  of  large  artifacts  without 
greatly  distorting  the  speech  signal. 

After  the  noise  PSD  estimate  is  subtracted  from  the 
input  signal  PSD,  zeros  are  placed  in  the  resulting  PSD  at 
discrete  frequency  locations  that  have  abnormally  large 
noise  power  estimates.  This  eliminates  any  large  noise 
artifacts  that  might  remain  due  to  error  in  the  noise  PSD 
estimate.  Emperical  experiments  showed  that  any  discrete 
frequency  in  the  noise  estimate  that  has  a  power  level  over 
four  times  the  average  over  all  frequencies  was  a  good  can¬ 
didate  for  a  frequency  domain  zero. 

Filter  Design  by  Inverse  Transform 

Once  the  noise  PSD  has  been  estimated,  a  suppression 
filter  for  the  noise  is  easily  designed  by  taking  the 
inverse  DFT  of  the  inverse  of  the  PSD  [25].  The  resulting 


impulse  response  forms  the  weights  of  a  transversal 


filter . 


Linear  phase  response  may  also  be  obtained  by 


proper  choice  of  phase. 

To  keep  the  computational  load  reasonable,  the  trans¬ 
versal  filter  must  be  limited  to  between  10  and  20 
weights.  This  requires  sampling  the  noise  PSD  at  only  10  to 
20  equally  spaced  frequencies.  The  resulting  poor  frquency 
resolution  allows  very  narrowband  noise  to  be  ignored  if  it 
falls  between  frequency  samples.  For  this  reason,  no  per¬ 
formance  comparisons  were  made  for  this  filter  technique. 

Filter  Design  by  Notch  Placement 

Since  many  of  the  background  noises  encountered  contain 
one  predominant  narrowband  component,  it  is  unnecessary  to 
use  a  filter  with  as  many  coefficients  as  is  generally 
required  by  the  inverse  transform  method.  A  much  more 
heuristic,  yet  very  practical  approach,  is  to  direct  the 
noise  filter  design  process  directly  toward  the  major  spec¬ 
tral  component  of  the  noise. 

The  method  considered  here  first  analizes  the  estimated 
power  spectrum  of  the  noise  to  identify  the  maximum.  This 
maximum  is  then  the  primary  target  of  the  noise  filter. 
Using  the  amplitude  and  bandwidth  of  the  local  maximum  found 
in  the  noise  spectrum,  a  recursive  notch  filter  is 
designed.  The  notch  filter  used  in  this  research  was  a 
bilinear  transformation  of  the  transfer  function, 


where  u>0  is  the  frequency  of  infinite  attenuation  and  is 
the  -odB  bandwidth  centered  about  uQ.  This  filter  was 
chosen  because  its  gain  at  the  high  and  low  frequency 
extremes  approach  unity  resulting  in  minimal  amplitude  dis¬ 
tortion  for  portions  of  the  spectrum  that  are  not  near  the 
center  frequency. 

The  values  of  and  are  obtained  by  analizing  that 
estimated  noise  PSD  in  the  following  manner.  First,  the 
current  estimate  of  the  noise  PSD  is  searched  for  the  larg¬ 
est  peak  that  is  above  a  set  threshold  value.  The  notch 
filter  center  frequency  is  then  set  to  the  frequency  of  this 
peak.  The  bandwidth  of  the  notch  filter  is  set  equal  to  the 
width  of  the  peak  that  extends  above  the  threshold  value. 
The  threshold  value  was  chosen  to  be  four  times  the  average 
value  of  the  current  noise  PSD.  The  factor  four  was  chosen 
after  many  experiments  with  various  noise  sources. 

The  advantages  of  this  recursive  notch  filter  method 
center  around  the  speed  with  which  such  a  filter  may  be 
implemented.  The  recursive  filter  requires  very  few  opera¬ 
tions  per  data  sample  and  the  filter  parameters  need  to  be 
calculated  only  once  per  frame.  Still,  the  most  time  con¬ 
suming  operation  is  the  calculations  of  the  noise  PSD  esti¬ 
mate.  Hardware  FFT  processors  should  make  this  practical. 

Some  background  noise  sources,  such  as  helicopter 
noise,  contain  more  than  one  narrowband  noise  component. 
The  above  method  can  be  extended  by  cascading  several  notch 


filters.  Each  filter  would  adapt  and  track  each  narrowband 
noise. 


Filter  Design  by  Adaptive  Filter 


Design  of  noise  suppression  filters  by  linear  predic¬ 
tion  has  been  shown  effective  for  narrowband  noise  [25]. 
Figure  15  shows  the  block  diagram  of  an  adaptive  predictor 
for  noise  suppression  in  speech.  When  a  noise  signal  is 
applied,  the  adaptive  algorithm  adjusts  the  weights  of  the 
transversal  filter  to  minimize  the  error  signal.  As  the 
algorithm  converges,  the  transfer  function  of  the  system 
approximates  the  inverse  filter  that  would  be  required  to 
suppress  the  input  noise  signal.  When  speech  is  detected, 
the  adaptive  algorithm  is  turned  off  and  the  filter  weights 
are  held  constant  at  their  current  values.  Adaptation 
resumes  when  speech  is  no  longer  indicated.  For  this 
research,  the  least  mean  square  (LMS)  algorithm  [18]  was 
used  for  the  automatic  adjustment  of  the  weights. 
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Figure  15*  Adaptive  Predictor  for  Speech 
Filtering 
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Figure  16.  Adaptive  Transfer  Filter 
In  these  experiments,  white  psuedo-random  noise  is  used 
for  the  bias  noise  source.  The  flattness  of  the  passbands 
and  the  depth  of  the  stopbands  are  controlled  by  the  bias 
noise  to  input  noise  ratio.  The  power  of  the  noise  bias  was 
set  based  on  experimental  results  to  be  one-fourth  the  ex¬ 
pected  average  power  of  the  input  speech  signal.  Flatter 
passbands  and  deeper  notches  could  also  be  achieved  by  using 
more  weights  in  the  filter. 


Comparison  of  Filter  Methods 


A  series  of  tests  were  conducted  to  evaluate  each  of 
the  filter  methods  previously  described  except  the  inverse 
transform  method.  The  noise  used  for  these  experiments  was 
a  sample  of  noise  recorded  in  a  RH-53  helicopter.  All  the 
tests  were  run  at  a  simulated  sample  rate  of  8000  Hz.  The 
four  techniques  evaluated  were:  a)  Spectral  subtraction 

(SSB)  with  noise  PSD  smoothing  the  frequency  domain  notch 
placement,  b)  Adaptive  notch  placement  (ANF)  using  two 
second  order  notch  filters,  c)  Adaptive  predictor  filter 


(APF)  using  a  15  weight  filter,  d)  Adaptive  transfer  filter 
(ATF)  using  a  15  weight  filter.  The  performance  of  each 
filter  was  compared  using  a  signal-to-noise  ratio  calcula¬ 
tion  based  on  the  average  spectral  error  in  the  corrupted 
signal  before  and  after  filtering.  The  spectral  error  was 
averaged  over  the  speech  portions  of  the  input  signal 
only.  Also,  performance  was  compared  by  computing  the  av¬ 
erage  log  area  ratio  (LAR)  error  for  a  10th  order  linear 
predictor  [26]. 

Objective  Comparisons 

Figure  17  shows  the  input  and  output  signal-to-noise 
ratios  for  various  levels  of  input  noise.  Figure  18  shows 
the  LAR  error  for  various  levels  of  input  noise.  Note  that 
only  the  SSB  and  ATF  methods  consistently  improve  SNR  and 
LAR  error.  Note  also  that  the  ATF  scored  best  in  SNR  im¬ 
provement  but  the  SSB  proved  better  in  LAR  error  reduc¬ 
tion.  The  deep  narrow  notches  formed  by  zero  placement  in 
the  SSB  filter  cause  much  spectral  error  but  are  easily 
smoothed  over  by  the  relatively  low  order  linear  predic¬ 
tor.  The  APF  performed  poorly  due  to  its  tendency  to  dis¬ 
tort  the  spectrum  in  the  passbands  of  the  filter.  The  ANF 
successfully  suppressed  the  large  narrowband  noise  compon¬ 
ents  but  did  not  filter  the  subtiler  narrowband  noise  that 


was  below  its  threshold. 
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Figure  17-  SNR  Performance 


Figure  18.  Log  Area  Ratio  Performance 


Subjective  Comparisons 

Informal  listening  tests  indicated  that  all  the  methods 
evaluated  reduced  the  perceived  level  of  background  noise. 
Casual  listening  greatly  favored  the  SSB  technique.  The  3SB 
metnod  was  the  only  one  to  remove  entirely  the  large  narrow- 
band  components  of  the  test  helicopter  noise.  All  the  other 
methods  had  audible  residuals  of  the  major  noise 

ill 


components.  The  ATP  method  had  the  least  distortion,  while 
the  APP  tended  to  emphasize  the  higher  frequencies. 

Conclusions 

Of  the  four  filter  methods  evaluated,  none  ofie  a 
complete  cure  for  background  noise  in  speech.  Spectral 
subtraction  with  zero  placement  and  the  adaptive  transfer 
filter  were  the  most  effective.  The  spectral  subtraction 
technique  may  have  an  advantage  in  applications  with  linear 
predictive  coding  since  it  performed  best  in  the  log  area 
ratio  error  evaluation.  Formal  subjective  tests  need  to  be 
done  to  determine  the  effect  these  filters  have  on  the  in¬ 
telligibility  of  noisy  speech.  The  adaptive  notch  placement 
scheme  and  the  adaptive  predictor  were  not  very  effective  on 
the  test  signals  used  but  might  be  useful  in  other  applies- 


CHAPTER  V 


ENHANCEMENT  OF  SPEECH  SIGNALS 
BY  TWO  DIMENSIONAL  SIGNAL  PROCESSING 

Recently  there  has  been  considerable  interest  in 
investigating  new  avenues  for  removing  high  ambient  noise  in 
speech  [l,  2,  27-31]-  Chapter  IV  presented  a  discussion  and 
evaluation  of  several  filtering  techniques  for  suppressing 
background  noise  in  speech  signals.  Spectral  subtraction  is 
an  effective  method  for  removing  narrowband  noise  from 
speech  signals.  However,  there  are  difficulties  with  the 
technique  when  the  noise  is  wideband  or  random  in  nature. 
For  example,  consider  the  white  noise  case.  The  average 
value  of  the  noise  spectrum  can  be  easily  subtracted  from 
the  corrupted  signal  but  the  frame  to  frame  variations  of 
the  noise  will  still  appear  in  the  output  signal  as  chirps 
or  musical  noise.  Over  estimating  the  average  noise  level 
is  helpful  in  removing  this  residual  noise  but  only  at  the 
cost  of  removing  more  speech  signal. 

The  random  nature  of  the  chirp  noise  suggests  that  some 
kind  of  spectral  smoothing  might  be  useful  to  suppress  the 
residual  random  variations  that  remain  after  subtraction  of 
the  average  noise  level.  Smoothing  of  noisy  spectral  data 
to  minimize  the  effects  of  residual  artifacts  in  spectral 


subtraction  has  been  shown  to  be  useful  for  image  resto¬ 
ration  applications  [27].  The  power  spectrum  of  each  frame 
of  data  can  be  smoothed  using  any  one  of  several  techniques. 
The  linear  predictive  coding  process  itself  offers  some 
smoothing  since  only  a  limited  number  of  poles  are  available 
for  modeling  the  signal.  In  channel  vocoder  systems,  some 
spectral  smoothing  can  be  added  by  slight  overlap  of  the 
analysis  channels  [28].  Frame  to  frame  correlation  of 
speech  frequency  data  has  also  been  used  in  channel  vocoders 
for  noise  suppression  [29].  In  such  a  system,  the  output  of 
each  frequency  channel  is  lowpass  filtered  to  remove  any 
rapid  variations.  Logarithmic  filtering  of  the  envelope  of 
individual  frequency  channels  has  been  shown  to  be  useful 
for  enhancing  speech  corrupted  by  white  noise  [30].  Some 
frame  to  frame  smoothing  is  introduced  in  spectral  sub¬ 
traction  if  the  frame  size  and  overlap  are  made  relatively 
large  [31 ] • 

The  acoustic  tube  model  for  speech  production  implies 
that  the  power  spectrum  of  any  one  frame  of  data  will  be 
fairly  continuous  as  a  function  of  frequency.  Also,  since 
the  speech  parameters  do  not  change  rapidly,  the  frame  to 
frame  variations  of  amplitude  of  any  one  frequency  will  also 
be  continuous.  This  dual  continuity  of  speech  spectral  data 
in  both  time  and  frequency  suggests  that  some  type  of  two 
dimensional  filtering  might  be  applicable. 

When  the  speech  power  spectrum  data  is  displayed  as  a 
spectrogram,  an  image  is  formed  with  dimensions  of  time  and 


frequency.  The  spectrogram  makes  visible  both  the  time  and 


frequency  correlation  of  the  speech  spectrum.  This  image 
can  then  be  processed  using  two  dimensional  techniques  to 
smooth,  improve  contrast,  or  otherwise  enhance  the  spectral 
features.  The  resulting  processed  spectrum  can  then  be 
combined  with  the  original  phase  data  and  inverse  trans¬ 
formed  to  recover  the  speech  signal.  This  procedure  has 
been  investigated  for  detecting  single  tones  in  white  noise 
[32].  This  chapter  presents  some  interesting  results  on 
applying  two  dimensional  processing  techniques  to  achieve 
speech  enhancement. 

Two  Dimensional  Representation 

The  speech  signal  is  first  converted  to  the  short-time 
Fourier  transform  (STFT)  domain  by  suitable  sampling,  win¬ 
dowing  and  discrete  Fourier  transformation.  The  transform 
results  in  a  complex  two-dimensional  function,  which  can  be 
represented  in  the  form  M(k,n)</  P(k,n) ,  where  M(k,n) 
corresponds  to  the  magnitude  and  P(k,n)  corresponds  to  the 
phase  with  k  and  n  respectively  representing  the  frequency 
and  the  time  indices.  The  plot  of  M(k,n)  is  usually  called 
spectrogram.  Since  M(k,  n)  is  an  image,  all  the  image 
processing  techniques  are  available  to  'clean'  the  image. 
Figure  19  shows  a  simplified  block  diagram  of  a  two¬ 


dimensional  filter  approach.  The  two  blocks  in  the  middle 
simply  identify  that  the  magnitude  and  the  phase  functions 
are  modified  using  two  separate  two-dimensional  filters. 


The  resulting  modified  magnitude  and  phase  functions  are 
then  recombined  using  an  appropriate  inverse  transform  and 
synthesis  method  [6]  to  form  the  time  domain  signal. 


Filtering  in  the  Double  Transformed  Domain 


To  introduce  this  approach,  consider  the  spectrogram 
M(k,  n)  displayed  in  Figure  20a  for  the  uncorrupted  speech 
"Don't  gift  wrap  the  tall  glass.  They  shook  hands 
for  good  luck." 

Now  consider  the  two-dimensional  discrete  Fourier  transform 
of  M(k,  n) , 

'jXk,  n)  ]  =  FM(k,  n)  /FP(k,  n)  (5-1) 

The  function  FM(k,  n) ,  corresponding  to  the  spectrogram  in 
Figure  20a,  is  displayed  in  Figure  20b.  It  is  clear  that 
the  two  dimensional  transform  of  M(k,  n)  does  not  have  the 
usual  connotation,  as  M(k,  n)  is  a  function  of  time  and 
frequency.  However,  the  function  in  (5*1)  exhibits  some 
interesting  properties  that  are  amenable  for  noise 
filtering.  Figure  21a  displays  the  spectrogram  for  the  two 
sentences  given  earlier  when  the  speech  is  corrupted  by 
white  noise  to  a  SNR  of  about  0  dB.  Figure  21b  displays  the 
function  FM(k,  n)  in  (5*1)  for  the  spectrogram  in  Figure 
21a.  It  is  clear  from  Figures  20a  and  21a  that  there  is 
hardly  any  resemblance  between  these  spectrograms  even 
though  the  speech  content  is  the  same.  However,  the  story 


b.  TWO-DIMENSIONAL  TRANSFORM 


Figure  21.  Noisy  speech  spectrogram  and  two-dimensional  transform 


is  different  for  FM(k,  n) .  Figures  20b  and  21b  clearly  show 
high  energy  concentrations  near  the  origin.  This 
observation  is  used  in  the  following. 

An  example  of  how  a  speech  signal  may  be  filtered  by  a 
two-dimensional  modification  of  the  spectrogram  is  shown  in 
Figure  22.  The  first  three  parts  of  the  figure,  Figures 
22a,  b,  and  c,  show  the  original  speech  spectrogram,  the 
noisy  speech  spectrogram,  and  the  two  dimensional  Fourier 
transform  of  the  noisy  speech  spectrogram.  Figure  22d  is 
obtained  from  Figure  22c  by  using  a  two-dimensional  filter. 
This  simply  corresponds  to  passing  the  noisy  spectrogram 
through  a  band  pass  filter.  Removing  the  high  frequencies 
smooths  the  spectrogram  while  removing  the  low  frequencies 
enhances  the  contrast  between  the  background  and  the  speech 
signal.  Figure  22e  corresponds  to  the  filtered  speech 
spectrogram,  which  is  obtained  from  Figure  22d  by  inverse 
transforming.  It  is  clear  from  Figures  22a  and  22e  that  the 
filtered  spectrogram  shows  a  great  deal  more  features  of  the 
original  speech  signal  than  the  original  noisy  speech 
spectrogram.  The  results  presented  in  Figure  22  are  for 
magnitude  filtering  only.  The  original  noisy  phase  can  be 
used  for  reconstructing  the  time  domain  signal. 

The  results  presented  in  this  chapter  are  still  at  a 
preliminary  stage.  However,  the  results  indicate  that  there 
is  a  significant  potential  in  studying  these  concepts.  In 
informal  listening  evaluation,  the  two-dimensional  processed 


a.  CLEAN  SPEECH  SPECTROGRAM 


listening  tests  need  to  be  conducted  to  justify  the  results. 
Other  aspects  that  need  to  be  investigated  are  in  the  area 
of  improving  the  phase  estimate  and  making  use  of  the  im¬ 
proved  noisy  phase  in  reconstructing  the  phase.  The  work 
done  by  Oppenheim  et  al  [33]  should  be  helpful  in  this  en¬ 
deavor.  At  the  same  time,  the  3tudy  done  by  Wang  and  Li  m 
[34]  that  the  phase  is  unimportant  in  speech  enhancement 
should  put  a  different  light  on  the  phase  in  the  recon¬ 
struction  of  speech. 

Conclusion 

This  chapter  presented  some  preliminary  results  on 
using  image  processing  techniques  for  speech  enhancement. 
The  basic  idea  is  that  the  two  dimensional  Fourier  trans¬ 
forms  of  clean  and  noisy  speech  spectrograms  have  most  of 
the  speech  energy  concentrated  near  the  origin  and  the  spec¬ 
trogram  constructed  from  this  high  energy  area  obtained  from 
the  noisy  spectrogram  has  more  features  of  the  original 
speech  signal  than  the  original  noisy  speech  spectrogram. 
From  the  informal  listening  tests,  the  two  dimensinal  pro¬ 
cessed  speech  sounds  quieter  with  some  added  clarity. 
Further  work  is  necessary  in  this  area. 
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CHAPTER  VI 


DISCRETE  TIME  ESTIMATION 


The  basic  problem  of  prediction  is  important  in  many 
areas,  such  as  speech  processing  [6],  seismic  signal  proces¬ 
sing  [35]»  control  theory  [36]  and  many  others  [37]*  The 
problem  is  usually  reduced  to  finding  the  inverse  of  the 
data  covariance  matrix.  Considerable  deal  of  work  has  been 
done  for  the  case  of  stationary  process  wherein  the 
covariance  matrix  results  in  a  Toeplitz-type  matrix.  The 
special  structure  of  the  Toeplitz  matrix  of  order  M  allows 
for  an  inversion  in  0(M2)  operations  (multiplications  and 
additions)  compared  to  O(M^)  operations  required  for  the 
inversion  of  an  arbitrary  matrix  [38],  In  this  chapter,  a 
prediction  method  is  presented  for  the  case  of  non-Toeplitz 
covariance  matrices.  A  brief  review  of  the  problem  is 
presented  below.  Given  the  data  yj ,  0  <_  i  £  N-1  ,  find  the 
coefficients  aj^ ,  1  i  _<.  M,  in  an  Mth  order  predictor 
M 

*p  3  •£  ai  yp-i  •  0<p<N-l  (6.1) 

1*1 

such  that  the  mean  squared  error 

Error  *  £  (y  -  ^  )2 
P  p 
P  K 


(6.2) 


i3  minimized.  In  matrix  form,  the  least  squares  problem  can 
be  formulated  by  starting  with 

£  1  ft  .0  : ::  °  In 


yN-l  yN-2 


(6.3) 


If  dj  =  k  >  1,  then  we  identify  the  prediction  as  the 
k  step  prediction  using  an  mth  order  predictor.  In  symbolic 
matrix  form,  (6.3)  can  be  written  as 


iu  =  YM  aj, 


(6.4) 


where  the  bars  below  d  and  a  denote  that  they  are  vectors. 

It  is  well  known  that  the  least  squares  solution  of 

(6.4)  is  given  by 


*M  *  (*M  W1  yM  £n 


(6.5) 


The  recursive  solutions  for  (6.5)  have  been  developed  by 
Lee,  Morf,  Kailath,  Friedlander  and  others  [39-46].  The 
method  presented  here  is  based  purely  on  matrix  algebra  and 
can  be  related  to  the  earlier  work.  The  number  of  oper¬ 
ations  required  by  this  method  is  Q(M2)c,  where  c  is  a  con¬ 
stant  . 
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Set  up  of  Recursion  Equations 


The  solution  in  (6.5)  will 
manner.  Let 

J^T  =  tep+k_i  ep+k-2  eo3 

where 

ei  =  dN_1 _i 


Let 


be  computed  in  the  following 

(6.6) 

(6.7) 

(6.8) 

(6.9) 

(6.10) 


To  relate  (6.8)  to  (6.3),  we  need  to  define  K  =  N-M  in 


The  least  squares  solution  of  (6.8)  can  be  obtained 


from 

Op  BJ>  *p  ■  bp  b 

and  Is 

T-1 

JJ_  -  (B  B')  B„  E 

n3  p  p  p 

and  the  solution  in  (6.5)  is  given  by 


(6.11) 


(6.12) 


=  Cw(M- 1 )  u^(m-2) 


uM(0) 


(6.13) 


The  vector  jj_p  in  (6.9)»  derived  in  (6.12),  will  be  computed 
recursively  and  the  derivation  for  this  is  discussed  in  the 
next  two  sections. 


Structure  of  Bp 

The  matrix  Bp  in  (6.10)  can  be  expresseed  in  two  forms. 


First, 


(6.14) 


where 


Second, 


where 


Vl  “  [°  0  *  '  •  0  y0  *  '  •  yK-l  ] 


(6.16 


(6.17 


Vl  =  [yK+l  ‘  *  *  yK+P'l]  * 


(6.18 


Prom  (6.14),  we  can  write 

Pd  r\  u 


T  VlVl 

B  B  = 
p  p  T  T 

Vibm 


Vi  V. 

y0  +  ^p-1  Vl 


Prom  (6.16),  we  can  write 


B  BT  = 
P  P 


Vi  Vi  +yK 


Ap-1  Vi  +  yK  Vl 


Vi  Vi  Vi  yK  8p.i  Vi +  Vi  Vi 


(6.19 


(6.20 


Equations  (6.19)  and  (6.20)  will  be  used  in  the  fol 
lowing  for  the  proposed  recursive  algorithm. 


Special  Structure  of  Equation  (6.11) 


For  ease  of  notation,  let  us  define 


P-1  Vi  =  Vr 


»  V*  -J*  "J  -• 


i  i  a  ,  =  L  i » 

p-1  -p-1  ~p- • 


Vi  y« 


=  G 


-P-1 


5p-l  ^p-1  V^* 


B  .  B1  ,  =  R  , 

p-1  p-1  p-1 


(6.22) 

(6.23) 

(6.24) 

(6.25) 


Equation  (6.11)  can  now  be  expressed  for  stages  (p-1)  and  p 
respectively  by 

Rp-1  Vl  ~  Vl 


Rp-1  Lp-1 

'Up  <’)■ 

rn  .  i 

-p-i 

.  Lp-1  rp 

Pp  U) 

s 

y0  ep+K  +  ^p-1  Vl 

(6.26) 

(6.27) 


where  we  have  used  (6.19)  for  Rp  and 
2  A  VT  * 

rp  “  y0  *  Vl  V'  • 


(6.28) 


Recursive  solutions  for  (6.27)  -  (6.28)  require  the 

knowledge  of  the  solution  of  the  following  equations 


p-1 

a  .  = 

-p-1 

•  v.- 

(6.29) 

I 

6  , 

=  L  .  , 

(6.30) 

p-1 

Vi 

Vi 

Vi 

(6.31 ) 

Vi 

■  Vi  • 
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This  requires  a  few  intermediate  steps,  and  are  discussed 
below. 

Intermediate  Steps 

Consider  the  equations 


[Vi 

+  C  ,  CT  .1 
-p-1  -P-1J 

Vl 

Vi 

[Vi 

+ 

4° 

1 

1 

1 _ 1 

tl 

1 

•op" 

Vi 

[v, 

+  C  .  CT 

-p-1  -p-1j 

Vl  = 

Vi 

It  is 

well  known 

that  [47] 

[v. 

♦  V. 

-1 

*  Vl 

1 

V 

with 

Vi  = 

u£.  "p-i 

Vi  • 

Using  the  solutions  for  (6 


(6.32) 

(6.33) 

(6.34) 

Vi  £p-i  Vi  (6-35) 

(6.36) 

29)  -  (6.31  )»  we  can  write 


V'  =  1  *(+)  £1  VI 

Vi  =  Vi  ..2W1  (4-i  VI1 


(6.37) 


(6.38) 


and  (6.45b)  follows.  In  a  similar  manner,  the  others  can  be 
shown. 

Pinal  Solution 


Using  the  analysis  discussed  above,  we  can  obtain  the 
solution  of  (6.27),  and  is 


V2> 


.(*, 


O  e£+K  +  X 


Vi  Vi) 

p  I  i 


SI  ,  H  , 
tp-J_  -p-t 


6  .  L  i 

~p-l  -p-1 


i!p  (')  ’  Vl  '  "P12’  Vi 


(6.50a) 

(6.50b) 


It  is  clear  that  the  solutions  for  equations  (6.26), 
(6.29)  -  (6.31)  are  assumed  to  be  known  for  stage  (p-1). 
The  solutions  for  stage  p  are  given  in  (6.45)  -  (6.47)  and 
(6-50).  It  is  interesting  to  point  out  that  we  have  used 
four  sets  of  equations  to  solve  the  generalized  prediction 
problem  as  compared  to  one  set  of  Teoplitz-type  normal 
equations  for  the  stationary  process.  Parallels  can  be  seen 
between  the  above  solution  for  one  set  of  equations  and  the 
classical  solutions  of  Toeplitz-type  normal  equations  [48 J. 
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CHAPTER  VII 


DESIGN  OP  PAST  RECURSIVE  ESTIMATORS 

In  many  signal  processing  applications,  computational 
speed  is  of  considerable  importance.  It  is  often  desired  to 
have  an  excellent  estimate  available,  and  generated  rapidly 
from  a  large  amount  of  data.  This  means  that  only  a  limited 
number  of  multiplications  are  allowed  in  obtaining  the  esti¬ 
mate.  It  is  therefore  important  that  the  multiplications  be 
selected  so  that  they  are  maximally  effective  in  generating 
a  good  estimate.  The  coefficient  involved  in  the  multipli¬ 
cation  should  be  optimal,  and  the  data  multiplied  should  be 
optimally  selected  with  regard  to  some  performance  measure. 
It  is  reasonable,  if  much  data  is  to  be  processed,  to  spend 
a  great  deal  of  design  effort  in  solving  the  optimization 
problem.  Of  course  this  design  effort,  if  it  is  very  in¬ 
volved,  must  be  done  before  the  data  becomes  available,  i.e. 
it  must  not  require  the  data  but  only  a  statistical 
knowledge  of  the  data. 

In  this  Chapter  we  propose  a  recursive  digital  filter 
with  a  fixed  number  of  multiplications  as  our  estimating 
structure.  In  certain  cases  [49]»  primarily  when  state 
models  are  available,  recursive  estimators  have  proved  to  be 
a  computationally  efficient  means  of  solving  the  normal 
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equations  and  generating  the  best  linear  estimate.  Of 
course  the  Kalman  filter  [50]  is  the  best  known  of  such 
results.  The  filter  structure  we  propose  is  not  in  general 
optimal.  It  is  in  part  dictated  by  the  allowable  number  of 
multiplications.  The  coefficients  are  optimized,  and  the 
structure  is  optimized  to  the  extent  that  the  best  measure¬ 
ments  are  selected.  Optimal  measurement  strategies  have 
previously  been  considered  in  control  and  estimation  appli¬ 
cations  with  state  models  and  white  noise  [51-54]-  In  this 
Chapter  however,  we  only  assume  the  availability  of  a  sta¬ 
tistical  knowledge  of  the  observation  set  and  its  relation 
to  the  signal  to  be  estimated.  At  each  stage,  a  subset  of 
the  data  is  to  be  summed  and  multiplied  by  the  best  coef¬ 
ficient,  and  combined  with  a  linear  combination  of  previous 
estimates  to  provide  the  new  estimate.  The  parameter  opti¬ 
mization  component  of  the  design  is  relatively  simplified, 
with  no  difficult  matrix  inversions  required  to  obtain  the 
best  set  of  coefficients.  The  choosing  of  the  best  data 
selection  vector  is  shown  to  be  related  to  a  classical 
family  of  integer  programming  problems  which  have  received 
much  attention  [12,  13>  55  ]»  and  include  the  famous 
’’Traveling  Salesman  Problem.”  For  a  good  review  of  the 
integer  programming  problem  imbedded  in  this  study  the 
reader  is  referred  to  [12].  The  most  computationally  ef¬ 
ficient  means  of  solving  the  problem  available  at  this  time 
may  be  found  in  [13]* 
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Problem  Statement 


Consider  the  situation  in  which  a  large  volume  of  data 
is  available.  This  data  is  collected  in  one  large  vector, 
y,  having  elements  y^ .  A  recursive  estimator  is  to  be  de¬ 
signed,  of  the  form 


T  M  - 

Yjy  +  3 


M+-l,M+2, . . . 


(7.1  ) 


where  and  Yji  are  scalars  to  be  selected  in  order  to 
minimize  the  performance  measure 


*  EUx-Xj)2} 


(7-2) 

The  parameter  x  is  the  unknown  signal  to  be  estimated  by 


processing  the  data  according  to  (7.1).  The  vector,  is 


a  selection  vector  whose  elements  are  only  0  or  1.  Thus  the 
estimator  has  its  complexity  restricted  to  M+l  multipli¬ 
cations  per  iteration.  The  problem  is  to  select  the  coef¬ 
ficients,  orj  and  Yji>  and  the  selection  vector,  in  order 
to  obtain  the  best  possible  performance.  The  only  as¬ 
sumption  required  for  the  design  is  that 


Pyy  "  E{yyT} 


(7-5) 


(7.4) 


are  known  quantities.  The  design  is  to  be  carried  off  line, 
so  that  the  only  significant  calculations  which  must  be  done 


in  real  time  are  the  M+l  multiplications.  For  the 

structure  of  the  filter  is 


,  j-1  * 

*i  m  Yj  y  +  './ji-j-i 


while  Xn  is  of  the  form 


(7-5) 


X1  "  alel  y 


(7.6) 


Since  these  initial  estimates  require  fewer  multiplications, 
they  could  be  generated  more  quickly  than  the  later  esti- 
aates  provided  by  (7-1 )•  Thus  a  different  and  variable 
period  between  estimates  could  be  used  during  start  up. 

The  problem  will  be  solved  by  using  the  fact  that 


“in  J.  ■  min  {min  J  |givene  } 

aJ*{Yji,,€J  eJ  aj,{Yji} 


(7-7) 


Thus  the  problem  is  solved  by  considering  a  standard  L-Q 
parameter  optimization  problem,  and  then  optimizing  with 
respect  to  the  selection  vector,  *j. 


Parameter  Optimization 


The  estimate  obtained  from  (7-5)  may  be  written  as 


Wy  *  U  Vi-i* 

j*M+l ,H+2 , . • • 


where  p  is  calculated  recursively  according  to  the  -Igo 
ri  thm 


Vt 


M 

+  l 
1-1 


y£i®£-i 


£  -  M+1.M4-2,, 


(7-9) 


For  2sj<M,  the  estimate  is  calculated  as 


XJ  "  +  1-1 


(7.10) 


and  the  vectors,  p  ,  are  calculated  a3 


T  .  T 

a£e£  +  J  ,  y£i8£~i 
1  £-2,, 


..M 


The  initial  condition  for  (7*10)  is 


*1  “  Vi  y 


and  for  (7.11), 


(7.11) 


(7.12) 


.  T  T 

Bj  •  Vj 


(7.13) 


For  notational  convenience,  we  designate 


BT  -  bT 

eji  Vi 


(7.14) 


The  performance  measure  may  be  written  as 


J.  - 


EHx-a^1,  -  I^B^y)2) 

j-M+l,M+2, 


(7.15) 


Optimization  at  each  stage  requires  that 


0; 


(7.16) 


as  necessary  conditions  for  an  optimum.  This  leads  to  the 
set  of  linear  equations 


Vj  ’  d> 

where 


(7.17) 


(7.18) 


Pj  is  an  (M+l)  x  (M+l)  symmetric  matrix  partitioned  as 


"  p 

P  „ 

ee 

eB 

V 

PBe 

P8B 

. 

(7.19) 

and  dj  is  partitioned  as 


The  submatrix, 


has  as  its 


iktJl  element 


(7.20) 


The  term  Ptc  is  a  scalar  defined  as 


T 

P  -  e,  P  c4 

ee  j  yyj 


(7.22) 


while  P£p  is  an  M  dimensional  row  vector  whose  k*'11  element 


P  **  B.?P  e. 
eBk  jk  yy  i 

It  is  assumed  that  ^  0.  The  term  d€  is  a  scaler 

T 

d  -  e.  P 
E  J  yx 


(7-23) 


(7-24) 


and  dn  is  an  M  dimensioned  column  vector  with  ktn  element 


d«  "  ? 
pv  Jk  y* 


(7.25) 


Equations  such  as  (7*17)  occuring  in  linear  estimation 
tneory  are  generally  referred  to  as  normal  equations.  When 
Pj  is  nonsingular,  the  optimal  set  of  coefficients  is  ob¬ 
tained  by  solving  for  vj  as 


v  -  P  _1d 

j  J  i 


(7.26) 


During  the  start  up  period  (2<j<M),  the  preceding  equations 
(7.15  -  7.26)  are  applicable  with  M  replaced  by  j-l.  When 
the  set  of  coefficients  obtained  from  (7.26)  is  used  we  get 


min  {J.j given  e.)  -  P  -  d  TP  <i 
J  j  xac  3  j  3 


where  P„„  =  Elx^l 


(7.27) 


According  to  (7*7),  we  want  to  minimize  {7.21)  with 


respect  to  where  the  elements  of  can  be  only  1  or  0. 
J  J 

Clearly  this  is  equivalent  to  maximizing  the  expression. 


*  T  -1 

Jj  H  d3  Pj  dj 


(7.28) 


Optimizing  the  selection  Vector 


It  will  be  shown  that  the  maximization  of  J  is  related 

to  a  classic  problem  l  12,  13*  55]  generally  referred  to  as 

# 

the  quadratic  assignment  problem.  Maximization  of  J  is 


equivalent  to  maximizing  a  ratio  of  quadratic  forms  in  c 


J 


Although  this  is  an  easy  problem,  with  an  elegant  solution 
when  may  be  freely  chosen  [56],  it  is  a  difficult  problem 
when  each  element  of  is  either  a  zero  or  a  one.  There¬ 
fore  in  most  applications  one  would  probably  have  to 
restrict  the  selection  of  fj  to  a  search  for  the  best  out  of 
a  reasonable  number  of  candidate  vectors.  The  candidate 
vectors  would  be  hueristically  selected,  with  some  guiding 
principles  which  will  be  discussed. 
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A 

B* 

bt 

1 

c 

(7.29) 


where 


WE$P ee  ^ 
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(7-30) 


ee  .eg 
-1  -2 

A  -  P  -p  p  _CP 
tt  EC  E$  fie 

The  performance  measure  may  be  written  as 


J*  -  d  TAd  +2d  TBda+daTCda 
e  e  e  8  0  0 


(7.31) 

(7.32) 


(7.33) 


where  we  have  left  off  the  subscript  j  for  convenience. 
Using  the  matrix  inversion  lemma  [58],  we  see  that  (7-30) 
can  be  expressed  as 


c  -  >  [pE6 W-r.cJ 


(7.34) 


We  may  write 

dB  -  »TV 


P-  •  ®Tp  c 
Be  yx 


(7.35) 

(7*36) 


if  Pj  is  defined  according  to 


bTh 

J  • 


J  <7. 37) 

Substitution  from  [1.22)  and  (7*34  -  1.31)  in  the  last  term 
of  (7-33)  gives 


T 

Vcd6 


e  Q  e 
x22 


(7.38) 


where  R  is  defined  as 


with  Q«  4  defined  as 


Thus  the  expression  for  J  may  be  written  as 


«  IQU  +  2Q12  +  <J„Jt 


This  is  more  conveniently  written  as 


(7.46) 


(7-47) 


After  substituting  in  (7-47)  for  the  terms  Qjj,  we  get 


T  eT[l-P  M]P  P  T[l-P  M]1 

j*  -  p  tmp _ yt  y*  y*  yy 

yx  yx  T_ 

c  Re 


where 


-  '-IT 

M  -  8Pm  V 


(7-48) 


(7-49) 


The  first  term  is  a  scalar,  unaffected  by  the  choice  of  t. 
The  second  term  is  a  ratio  of  two  quadratic  forms  in  c 
Thus  (7-48)  may  be  written  as 


J*  -  P  HP  +  J** 
jx  yx 


where 


(7.50) 


T  T  T 
e  G  P  P  Ge 


j**  - 


T 

e  GP  e 


.  65*0 


(7.51) 


(7-52) 


The  goal  of  selecting  «j  is  to  maximize  the  ratio  expressed 
in  (7.51)*  The  symmetric  weighting  matrices  in  the  quad¬ 
ratic  forms  in  the  numerator  and  denominator  are  positive 
semi-definite,  and  positive  definite  respectively.  We 
remark  that  the  maximization  of  (7-51)  must  be  done  at  each 
stage,  and  so  it  is  a  significant  task.  (We  have  left  off 

x  i- 

the  index  J  indicating  the  stage.) 

Although  problems  related  to  the  performance  indicator 
(7-51)  have  been  treated  in  the  literature  [53—55  J »  the 
difficulty  of  the  quadratic  assignment  problem  should  not  be 
underestimated.  As  an  example,  for  even  a  modest  amount  of 
data,  say  y  is  of  length  26,  it  would  take  over  a  minute  to 
calculate  (7-51)  for  each  possibility,  assuming  the  calcu¬ 
lation  could  be  done  in  a  microsecond.  If  the  length  of  the 
data  vector  were  100,  the  time  required  to  check  all  pos¬ 
sibilities  would  be  measured  in  centuries. 

Instead  of  checking  all  possibilities,  we  can  restrict 
ourselves  to  the  most  likely  candidates,  and  we  can  elimi¬ 
nate  those  selection  vectors  which  have  previously  been 
used,  or  which  can  be  formed  by  summing  those  selection 
vectors  already  used.  Knowledge  of  the  truly  optimal  linear 
estimate , 


x  ■  P  P  y  ■  I  a.y. 

°  *y  yy  ±ml  i  i 


(7.55) 


can  be  used  to  find  the  likely  candidates.  As  a  simple 


example,  if 


*o  15  1-!y1  +  *9y2  +  1.0y3 

+  .lly^  +  .09y5 

(7.54) 

then  one  would  probably  guess  that  =  [11100],  = 

[0001 1 J  would  be  excellent  choices  for  selection  vectors. 
Furthermore  one  would  expect  performance  to  be  good  with 
only  two  multiplications  allowed.  We  believe,  however,  that 
even  with  problems  as  simple  as  this,  intuition  is  subject 
to  error  when  choosing  selection  vectors.  Naturally  when  y 
is  a  very  long  vector  some  computational  assistance  will  be 
required  to  select  the  candidate  vectors.  It  should  be 
remembered  that  this  is  a  part  of  the  design,  and  that  the 
purpose  of  going  to  this  design  effort  is  to  restrict  our- 
seleves  to  an  algorithm  with  few  multiplications  required  so 
that  the  filtering  algorithm  generates  estimates  quickly. 
We  are  willing  to  spend  considerable  extra  design  effort  to 
get  a  fast  processor. 

The  Design  Algorithm 

To  start  the  algorithm,  we  must  select  ,  and  to 
minimize  where 


(7-55) 
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J.  -  E{(x-x  )2} 


This  gives  for  a 


T  T 

«i  -  e,  P  /e,  P  e. 

1  1  yx  1  yy  1 


Since  when  this  is  used, 


J,  -  P  -  e  TP  P  Te  /e.TP  e. 
X  xx  1  yx  yx  1  1  yy  1 


we  select  to  maximize  the  quality 


T  T 

e.  *P  P  e. 
1  yx  yx  1 

T_ 

c_  P  e , 

1  yy  1 


Thus  and  are  obtained,  and  (31  is  found  as 


‘iT  -  VlT 


The  next  step  is  to  form  the  term 


*2  ’  eieiT/<eiTp„e1) 


and  evaluate 


G2T  5 


The  terms  G|pyy  and  G?Pyx  are  calculated, 
indicated  by  (7-51) 


(7.56) 


(7-57) 


(7*58) 


(7-59) 


(7.60) 


(7.61) 

The  expressi on 


*2  ”2  yy  2 

(7.62) 

is  maximized  with  respect  to  <2* 

The  coefficients,  <*2  and  Y21  are  found  according  to 
(7.26),  i . e . 


(7.64) 

Then  Pg  i3  found  as 

•/  -  V2t  + 

(7.65) 


Using  the  notation  indicated  by  ( 7 . 1 4 ) 


and  according  to  (7*14) 


g31  |  . 

w 

to 

H 

M 

»xT 

-  > 

The  matrix  is  formed  as 


(7-67) 


Mj  -  »3le3TV3rV 

(7.68) 

Equations  (7*61)  and  (7*62)  are  applicable  with  the  sub¬ 
script  "2"  replaced  by  "3"  and  ^  is  selected  accordingly. 
The  new  coefficients  are  obtained  by  solving  (7.26) 


P 

e3e3 

P<:3S3 

Pfl3e3 

• 

hi] 

where 


and 


(7-69) 


e3c3  63  PyyE3  P03®3  "  B3  PyyB3 


C3®3  "  'I*™**  %  "  V  “  B3Tpy* 


(7-70) 


(7-71) 


These  steps  can  be  generalized.  Let  us  assume  that  we 

are  in  the  start  up  period  (j<M+l),  and  have  ,  and 

«J  J  J 

where 


We  then  solve  for  pj  according  to  (7*11) 


<J<  ■»>  <¥• 

Y"Vi  +  1  YUB 

3  J  J  i«l  J1  j-i 


and  form 


?T 


f»,fl 


3+1  iVj 


Next,  the  matrix  is  calculated: 


Mj+1  "  8j+llBj+lPyyBj+l1  8j+l 


and  i s  evaluated  as 


GJ+i  -  u-OL,i 


yyj+i 


The  vector  €j+^  which  maximizes 


JJ+1  "  €j+lGJ+lPyxPyx  GJ+lCJ+l/ej+l  T<?J+lTpyyCJ+l 


is  selected. 


We  then  solve  for  the  coefficients 


where 


*J+lEj+l 


Ej+1  PyyCj+l 


6j+i6j+i  J+1  PyyBJ+i 


»  ■'  .  E  Tp  J  T  , 

cj+iBj+i  J+1  yy  i+1  « 


T 

J+l  eJ+lPy*  dej+1  “  VlTpy*  (7.79) 


The  algorithm  is  then  repeated  as  given  until  it  is  desired 
to  limit  the  number  of  multiplications  to  M+l  as  we  have 
indicated  in  this  paper.  Assuming  that  we  have  p^,  o j , 
and  Yj  for  j>M+l,  where 


5 


*1-1 


,TJ.l 


M.H 


(7.80) 


we  solve  for  p^  according  to  (7*9).  Then  (7-75)  through 
(7«78)  are  applied  and  (7*77)  is  maximized.  Equation  (7-78) 
is  used  to  select  the  new  coefficients.  The  algorithm  is 
thus  established  for  all  j. 

The  reader  may  be  troubled  at  this  point  because  it 
appears  that  an  ever  larger  matrix  needs  to  be  inverted  at 
each  stage,  during  the  start  up  period  and  that  a  large 
matrix  (if  M  is  large)  needs  to  be  inverted  thereafter.  We 
shall  show  that  there  are  computationally  efficient  re¬ 
cursive  means  of  carrying  out  the  required  matrix  inversion. 


Efficient  Matrix  Inversion 


The  Start  Up  Period: 

It  is  necessary  at  each  stage  to  calculate  two  matrix 

inverses  Pqq.-'5'  and  Pj-1*  In  this  section  we  indicate  how 
P  P  J 
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to  calculate  these  terms  at  stage  j+1  given  these  terms  are 


available  at  stage  j .  Thus  a  recursive  procedure  is  es¬ 
tablished.  During  the  start  up  period  we  note  that 


T 


e,Tp_ A  I  b.tp. 


f/Vj  ^  J 

The  Matrix  inverse  [57],  can  be  written  as 


(7.81) 


PPJ+1 

where 


r1  -  4 


Ltp_batp  I.  -l 


J 

Vj 

TP  B.a,. 

yy  1 

l 

T 

e/P^ 
J  yy  j 

is 

®,TP_54a„i4TP  8 

8J  PyyBj 


(eJ  W 


(7.82) 


(7.85) 

Using  the  matrix  inversion  lemma  [58],  we  see  that  (7.83) 
may  be  written  as 


■  v;  -  ^  rT-B’v>rl 


(7.86) 


■> 


where 


r  " 

J  77  j  (7.87) 

Equation  (7*86)  requires  only  the  inversion  of  a  scalar,  and 
since  Ppp  j  is  known,  we  see  that  we  have  an  efficient  way 

”1  rt  -i  tran  T5  *"1 


of  finding  given  PppJ  . 


Once  Ppp^^1  is  known  it 


is  not  a  difficult  matter  to  calculate  Pj+J^  using  equations 
(7-29  -  7«32)  and  (7.34) •  In  this  case  also,  only  a  scalar 
needs  to  be  inverted.  During  the  start  up  period  then, 
matrix  inversion  does  not  present  a  problem.  This  is  al30 
the  case  during  the  remaining  period  with  j>M+l. 


After  Start  Up: 


In  the  previous  section  a  method  wa3  developed  for 
inverting  a  matrix  of  increasing  dimension,  using  the  result 
from  the  previous  3tage.  In  this  section  we  are  inverting  a 
different  matrix  of  the  same  dimension  at  each  stage. 
Because  the  matrix  at  one  stage  is  very  closely  related  to 
the  matrix  at  the  previous  stage,  it  is  possible  to  obtain 
the  new  inverse  from  the  old,  with  surprisingly  small  amount 
of  calculation. 

It  is  known  that  the  lower  right  hand  portion  of  Pppj+i 

is  the  same  as  the  upper  left  hand  portion  of  Pa  „  .  ope- 

PP  j 

cif ically , 


e.'P  b.  i  8‘p  $* 


1+1  »jVj 


e.  F  B. 
i  yyj 


dlll  d12 


d12  |  C11 


(7.88) 


r  *T  * 

b/p  e. 


-‘iVl-M  1.  Kl  I  C12 


3  lV*^yy*J  |ci2  'c 


where 


„T  I 


fi*T  pj-l 
Pj 

bt 

rj-w-i, 

Suppose  that  P 


-1  ,• 


is  known  and  partioned  as 


b  , 

b  ' 

11 

12 

h  T 
b12 

b22 

(7.89) 


(7-90) 


(7.91) 

where  b-j^  is  an  (M-l)  x  (M-l)  matrix  and  b£2  is  a  scalar. 
We  know  from  [57],  that  this  matrix  inverse  can  also  be 
written  as 


-1  f  cu  +  I  -cIici2Dl 


where 


D  -  [C22  -  c12cn  C\2^ 


(7.93) 


Comparing  (7-91  )  and  (7*92)  it  is  clear  that  we  can  solve 
for  C^1  in  terms  of  known  quantities. 


ku  t'-'i/i!1'1 


(7-94) 


Prom  the  matrix  inversion  lemma,  the  above  may  be  evaluated 
as 


'n1  -  ku  <i«i2ti-bjc12rH,J> 


(7-95) 


Therefore  when  1  is  known  we  may  evaluate  C-jJ1  with  only 


a  scalar  inversion.  As  will  be  seen,  knowledge  of  Cll1  wil1 


-1 


allow  us  to  easily  evaluate  P_ „  A.  As  in  the  obtaining  of 

P0J+1 

(7*92),  we  can  derive  an  expression  for  ^  . 


1  4.  1  i  WT 

”<12*  1 

p  -1- 

v  u  12 

"ll 

1 

2 

10  H 

F 

L  <n 
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where  P  is  defined  as, 


(7-96) 


dI2  d12 


i-l 


but  may  be  more  conveniently  obtained,  again  using  the 
matrix  inversion  lemma: 


r>W  (7.98) 

Only  a  scalar  need  be  inverted. 

Therefore  we  have  established  a  convenient  mechanism  of 
generating  Ppp  First  obtain  using  (7-95).  Then 

obtain  F  using  (7-98)  and  substitute  the  results  in  (7*96). 
After  obtaining  Ppp  it  is  easy  to  calculate  Pj+i1  using 

equations  (7-29  -  7*32)  and  (7*34).  For  both  start  up  and 
afterwords,  we  have  thus  established  a  methodology  for  re¬ 
cursively  calculating  matrix  inverses.  This,  in  effect, 
frees  up  more  time  for  the  quadratic  assignment  problem 
which  is  now  clearly  seen  to  be  the  only  real  difficulty  in 
the  design  procedure.  We  will  not  consider  matrix  inversion 
during  the  transition  between  start  up  and  fixed  length 
operation  here. 


An  Example 

In  this  section,  we  shall  illustrate  the  design  of  the 
filtering  algorithm  with  an  academic  example.  It  is  assumed 
that  4  measurements  are  available 


yk  “  *  +  V  k  “  1 . *  (7.99) 

and  that  x  and  v^  are  uncorrelated.  The  noise,  v^,  is  zero 
mean  with  known  statistics 
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E{vp-! 


E{v22}-  2 


E{v32}»ii 

E(v42}-12 


E{vivk*"°!l*k 


(7.100) 


and  the  desired  signal,  x,  is  known  to  have  zero  mean  and 
variance  of  unity.  The  optimal  linear  estimate  is  easily 
found  to  be 


*o  "  PxyPyyly  "  ‘37Ayl  +  *187y2  +  *03Ay3  +  -031y4‘  (7.101) 

and  the  minimum  mean  square  error  is 


J  -  E{ (x-x  )2}  -  .374 
o  o 


(7-102) 


In  applying  our  algorithm  we  first  want  the  best  esti¬ 
mate  which  involves  a  single  multiplication 


*  T 

X1  "  Vl  y 

We  evaluate  the  choice  of  which 
limiting  ourselves  to  three  candidates 
possibilities , 


(7.103) 
maximizes  (7*58), 
which  appear  to  be 


(7.104) 


It  turns  out  that  the  middle  choice  is  the  one  which  maxi¬ 
mizes  (7*58),  so  evaluating  gives  for  the  first  estimate 


.286(7^2) 


(7.103) 


and  results  in  a  performance, 

J1  -  E {(x-Xj)  }  -  .429  (7-106) 
For  the  next  stage  one  could  argue  that  the  best  thing  to  do 
is  to  make  a  distinction  between  the  two  terms  in  (7*101) 
with  the  larger  coefficients,  or  to  give  some  weighting  to 
the  smaller  terms.  Thus  we  select  «2  fro®  among  the  candi¬ 
dates 


(7-107) 

As  it  turns  out,  the  last  choice  is  the  best,  resulting  in 
the  largest  value  of  (7*62).  Thus  the  best  estimate  at 
stage  2  is 

*2  ‘  •055(),3+V  +  •930X1  (7.108) 

Substitution  from  (7-105)  gives 


x2  -  -035(y3+yA)  +  .266<yi+y2) 

and  this  seems  reasonable  in  view  of  (7.105)* 
formance  measure  is 


(7.109) 
The  per- 


J2  -  E{(x-x2)2>  -  .399  (7.110) 

As  an  observation,  the  other  two  candidates  would  result  in 
equivalent  performance  with  each  other,  a  fact  which  could 
easily  be  reasoned  to.  The  reader  is  encouraged  to  work 
through  this  example  to  notice  that  there  is  not  enough 
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difference  in  the  choice  of  «2  '*:o  ^a-ke  one  confident  that 
the  "intuited"  solution  is  always  going  to  be  the  correct 
solution  vector  even  in  an  academic  problem.  This  points 
out  the  need  for  a  mathematical  approach  such  as  developed 
here . 

Discussion 

We  have  presented  a  method  for  designing  a  recursive 
linear  filter  with  a  fixed  number  of  multiplications  allowed 
for  each  interaction.  This  is  ideal  for  the  situation  when 
estimates  are  going  to  be  required  rapidly,  and  there  will 
be  a  large  amount  of  data  available  all  at  once.  The 
problem  has  had  two  aspects,  parameter  optimization,  and 
selection  vector  optimization.  The  parameter  optimization 
has  been  shown  to  have  a  solution  which  may  be  obtained  in  a 
computationally  efficient  manner.  The  selection  vector 
optimization  problem  is  difficult.  We  have  related  i t  to  a 
classic  problem  in  operations  research,  referred  to  as  the 
quadratic  assignment  problem.  The  difficulty  may  be  limited 
by  limiting  the  candidate  vectors  to  a  few  reasonable 
choices . 

In  this  Chapter  we  have  considered  only  stagewise  opti¬ 
mization  rather  than  optimization  over  the  entire  sequence 
of  estimates  generated.  Since  even  the  simpler  problem 
considered  here  has  an  open  research  area  imbedded  in  it 
(the  quadratic  assignment  problem),  it  may  be  premature  to 
consider  solving  the  more  general  dynamic  optimization 
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problem.  The  solution  of  such  a  problem  is  ultimately  de¬ 
sirable  and  could  be  pursued  within  the  context  of  modern 
control  theory.  It  is  also  desirable  to  solve  a  problem 
similar  to  that  posed  here,  but  in  a  more  general  vector 
format.  The  formal  mathematics  of  parameter  optimization  is 
not  a  difficulty  in  pursuing  the  vector  problem.  Indeed  it 
is  again  the  integer  programming  that  presents  a  compu¬ 
tational  limitation.  While  we  acknowledge  that  one  may  be 
reasonable  speed  hours  of  computational  effort  in  the  design 
of  a  rapid  algorithm,  it  is  clearly  not  reasonable  to  spend 
centuries  at  such  design. 

It  is  the  belief  of  the  authors  that  the  design  pre¬ 
sented  herein,  coupled  with  some  heuristic  appraoches  toward 
limiting  the  number  of  selection  vector  candidates,  repre¬ 
sents  a  reasonable  approach  to  the  design  of  a  rapid 
recursive  algorithm  for  a  certain  class  of  important  esti¬ 
mation  problems. 


CHAPTER  VIII 


CONCLUSIONS 

The  goal  of  this  research  was  to  investigate  the  pos¬ 
sibility  of  combining  an  adaptive  filter  with  a  linear  pre¬ 
dictive  coding  algorithm  to  form  a  robust  and  efficient 
system  for  narrowband  encoding  of  speech  signals  corrupted 
by  noise.  Various  prefiltering  techniques  for  improving 
linear  coding  systems  were  evaluated.  Primary  techniques  of 
interest  were  the  pitch  tracking  adaptive  filter  and  the 
spectral  subtraction  filter.  The  pitch  tracking  adaptive 
filter  proved  successful  in  suppressing  white  noise  in 
voiced  speech  sounds  but  did  not  work  well  when  the  noise 
was  narrowband  such  as  a  single  sine  wave.  It  was  found 
that  interaction  between  the  pitch  period  and  the  narrowband 
noise  produces  a  bias  error  in  the  adaptation  of  the  filter. 

Failure  of  the  pitch  tracking  adaptive  filters  to  sup¬ 
press  narrowband  noise  prompted  the  investigation  of  several 
other  prefiltering  methods.  The  most  successful  of  the 
filters  evaluated  was  the  spectral  subtraction  technique. 
Two  modifications  to  the  original  method  proved  very  useful 
for  improving  noisy  speech.  First  a  dual  time  constant 
noise  spectrum  estimate  improved  white  noise  suppression  and 


secondly  a  spectral  notch  feature  greatly  improved  narrow- 
band  noise  quieting.  Also,  very  successful  was  a  new  filter 
method  based  on  adaptive  filtering. 

Work  with  the  spectral  subtraction  filter  method  sug¬ 
gested  a  more  general  approach  to  speech  filtering.  Short 
time  Fourier  analysis  of  speech  produces  a  two  dimensional 
representation  of  a  speech  signal  which  may  be  processed 
much  like  image  data.  The  investigation  found  that  some 
types  of  speech  and  noise  signals  may  be  separated  using  two 
dimensional  filtering  on  the  short  time  Fourier  transform 
representation  of  a  noisy  speech  signal. 

The  performance  of  the  pitch  tracking  adaptive  filter 
depends  on  the  quality  of  the  pitch  period  estimate  used  to 
set  the  input  delay.  Early  attempts  to  implement  the  filter 
were  frustrated  by  the  degradation  of  currently  available 
pitch  algorithms  in  the  presence  of  noise.  It  was  found 
that  the  adaptive  filter  itself  could  be  modified  to  provide 
a  robust  pitch  estimate.  This  technique  was  used  ex¬ 
tensively  throughout  the  research  to  provide  pitch  estimates 
for  various  processing  algorithms. 

To  complement  the  filtering  algorithms,  fast  algorithms 
have  been  derived  for  ‘efficient  solution  of  the  linear 
estimation  problem.  These  include  a  fast  algorithm  for  the 
solution  of  the  general  discrete  time  linear  estimation 
problem  and  a  new  recursive  linear  estimator  suitable  for 
rapid  estimation  of  a  signal  in  noise.  The  approach  is 
related  to  the  classical  integer  programming  problem. 


Suggestions  for  Further  Stud: 


The  results  presented  here  suggest  several  avenues  one 
could  take  in  the  areas  discussed.  These  are  presented 
below. 

•  Considering  that  the  adaptive  algorithms  are  compu¬ 
tationally  complex,  fast  algorithm  development  jn  thi3 
area  is  important. 

•  The  results  presented  on  the  enhancement  of  speech 
signals  by  two  dimensional  signal  processing  are  still 


at  a  preliminary  stage. 

This  approach 

may 

be 

considered 

as  a  special 

case  of  the  problem 

of 

estimating 

time  varying 

process  parameters 

in 

the 

presence  of  stationary  noise.  This  area  is  wide  open 
as  all  the  image  processing  techniques  are  available 
for  noise  suppression,  coding,  data  compression,  etc. 

•  Most  of  the  estimation  algorithms  are  based  upon  the 
least  squares  analysis  (  analysis).  It  is  worthwhile 
to  investigate  the  possibilities  of  using  1^  (or  in 
general  Ip,  1  <_  p  <_  2)  analysis  for  signal  estimation 
when  the  signal  is  burried  noise.  Again,  this  area  has 
wide  implication  in  speech  processing. 

•  In  the  results  presented  on  the  design  of  fast  re¬ 
cursive  estimators,  we  have  considered  only  stage  wise 
optimization  rather  than  optimization  over  the  entire 
sequence  of  estimates.  The  more  general  dynamic  opti¬ 
mization  problem  may  be  a  difficult  problem  to  tackle. 
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However,  it  is  desirable  to  solve  this  problem  and 
could  be  pursued  within  the  context  of  modern  control 
theory. 
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CHAPTER  1 


INTRODUCTION 

Nearly  all  sciences  are  concerned  with  the  analysis  of 
measurement  data.  The  following  chapters  will  present  a  new 
tool  for  the  analysis  of  time  series  measurements;  in  par¬ 
ticular,  a  new  method  of  spectral  estimation  is  presented. 
Many  spectral  estimation  methods  already  exist  and,  in¬ 
creasingly,  new  methods  continue  to  be  developed;  therefore, 
it  is  appropriate  to  reflect,  briefly,  upon  the  reasons  for 
such  continued  activity  in  an  area  already  so  well  re¬ 
searched  . 

A  synergism  exists  between  advances  in  computer  tech¬ 
nology  and  advances  in  practical  methods  of  time  series 
analysis.  As  more  effective  (and  complex)  methods  of  time 
series  analysis  are  developed,  the  demands  for  smaller, 
cheaper,  and  faster  digital  circuitry  (capable  of  imple¬ 
menting  these  methods  within  the  size/cost/power  constraints 
of  various  applications)  are  increased.  As  smaller, 
cheaper,  faster  and  more  reliable  digital  circuitry  becomes 
available,  more  complex  (and  effective)  methods  of  time 
series  analysis  become  practical.  Fundamentally,  however, 
it  is  the  demand  for  improved  solutions  to  engineering  prob¬ 
lems  that  motivates  the  desire  for  more  effective  methods  of 
time  series  analysis. 


Motivation 


Most  information  we  have  about  the  world  around  us  is 
received  indirectly  through  time  series  measurements.  In 
the  case  of  vision,  one  determines  the  shape  (and  other 
characteristics)  of  an  object  by  reception  (measurement)  of 
light  waves  scattered  by  the  object.  In  the  case  of  speech, 
one  determines  the  intended  message  of  the  speaker  by  re¬ 
ception  (measurement)  of  acoustic  pressure  waves.  Pros¬ 
pecting,  manufacturing,  astronomy,  medicine,  and  economics 
are  but  a  few  of  the  areas  that  can  benefit  from  improved 
methods  of  time  series  analysis. 

Spectral  estimation  is  one  of  the  most  important  areas 
of  time  series  analysis.  In  many  cases,  knowledge  of  the 
time  series  spectrum  is  adequate  to  answer  all  important 
questions  regarding  the  system  producing  the  time  series;  in 
the  case  of  a  stable  time-invariant  linear  input-output 
system,  knowledge  of  the  output  process  spectrum  (together 
with  the  statistics  of  the  stationary  input  process)  will 
completely  characterize  the  system. 

Noise  corruption  is  among  the  fundamental  problems  of 
time  series  analysis.  All  useful  analysis  techniques  for 
measurement  data  are  at  least  mildly  tolerant  of  noise  since 
there  always  exists  a  small  probability  of  measurement 
error;  some  techniques  are  specifically  designed  to  account 
for  knowledge  of  the  noise  statistics  in  the  analysis  of 
noise-corrupted  measurement  data.  Regardless  of  the  analy¬ 
sis  technique,  the  fundamental  performance  limits  are  always 
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reduced  by  the  presence  of  noise.1  Consequently,  it  is 
always  advisable  to  minimize  noise  corruption  as  much  as  is 
practical;  still,  practical  constraints  imposed  by  some 
situations  do  not  permit  the  reduction  of  noise  corruption 
to  insignificant  levels  so  that  sophisticated  analysis  tech¬ 
niques  are  required  to  achieve  the  best  possible  per¬ 
formance  . 

Spectral  estimation  is  of  fundamental  importance  to  the 
various  applications  of  speech  analysis  and  practical  con¬ 
straints  imposed  by  many  of  these  applications  do  not  permit 
the  reduction  of  noise  corruption  to  insignificant  levels. 
Examples  of  such  applications  include  low  data  rate 
digital  voice  communications  systems  and  speech 
recognition/understanding  systems  among  others;  often  the 
cost  and/or  inconvenience  of  shielding  from  environmental 
noise  makes  significant  acoustic  noise  corruption  inevi¬ 
table. 

Autoregressive  (AR)  spectral  models  have  been  suc¬ 
cessful  for  various  systems  involving  speech  analysis;  more¬ 
over,  numerous  speech  synthesis  systems  based  upon  the  AR 
model  have  become  commercially  available  in  recent  years. 
Because  the  currently  available  practical  methods  for  AR 
parameter  estimation  yield  poor  results  in  common  noise 

1 1n  some  specialized  circumstances  the  performance 
limits  are  unchanged  by  the  presence  of  noise.  Even  when 
this  is  the  case,  the  complexity  of  the  analysis  methods 
required  to  achieve  these  limits  is  usually  increased  by  the 
noise  presence. 


environments  but  are  effective  in  sufficiently  quiet  en¬ 
vironments,  it  is  reasonable  to  retain  the  AR  model  for  the 
speech  process  while  attempting  to  develop  improved  methods 
for  estimating  the  AR  parameters. 

The  fundamental  limit  to  the  performance  of  any  esti¬ 
mation  procedure  depends  upon  the  available  information.  In 
theory,  even  the  most  obscure  (but  not  unrelated)  additional 
information  may  be  used  to  improve  a  parameter  estimate;  of 
course,  one  should  rely  first  upon  information  that  is  both 
easily  available  and  expected  to  provide  substantial  im¬ 
provement  . 

Most  recent  efforts  to  overcome  the  poor  performance  of 
classical  AR  estimators  in  noise,  including  the  present  one, 
have  attempted  to  employ  information  regarding  the  noise 
statistics  in  addition  to  the  noise  corrupted  time  series 
observations.  This  information  is  often  provided  simply  by 
deploying  additional  sensors  intended  to  measure  the  noise 
directly;  other  speech  analysis  systems  employ  prior  seg¬ 
ments  of  the  primary  observation  signal  that  are  thought  to 
be  free  from  speech  activity  to  predict  the  current  relevant 
noise  statistics. 

The  present  work  does  not  address  the  problem  of  ob¬ 
taining  accurate  noise  statistics.  Assuming  appropriate 
noise  statistics  to  be  available,  the  following  chapters 
develop  a  new  and  improved  method  of  estimating  the  AR 
signal  parameters  from  noise  corrupted  time  series  obser¬ 


vations. 


As  might  be  expected,  the  method  entails  increased 
computational  cost  over  less  effective  techniques;  it  is 
expected  that  performance  requirements  of  speech  analysis 
(and  other)  applications  -  as  well  as  cost  reductions  that 
are  continually  provided  by  advances  in  computer  technology- 
shall,  in  many  cases,  make  the  advantages  of  this  method 
appear  relatively  inexpensive. 

Overview 

Chapter  II  provides  a  general  discussion  of  the  various 
issues  and  techniques  of  spectral  estimation;  particular 
attention  is  given  to  the  problems  of  AR  spectral  esti¬ 
mation.  In  addition,  this  discussion  introduces  basic 
formulae  and  provides  an  historical  perspective  for  the 
subsequent  chapters. 

Chapter  III  presents  the  theoretical  foundations  of  the 
new  (weighted  information)  estimation  procedure.  After  some 
additional  motivational  discussion,  the  method  is  formulated 
as  an  approximation  to  an  ideal  (but  intractable)  formu¬ 
lation  and  a  generalization  of  a  commonly  employed  (noise 
filtering)  estimation  procedure.  In  addition  to  the  general 
formulation,  significant  contributions  of  this  chapter 
include  the  analogy  leading  to  equation  (3*20)  and  the 
properties  developed  in  the  fifth  section. 

Chapter  IV  discusses  a  variety  of  computational  methods 
relevant  to  AR  estimation  based  upon  the  weighted  infor¬ 
mation  formulation.  It  is  considered  that  the  area  of 


computational  procedures  as  requiring  the  greatest  attention 
for  further  extension  and  refinement  of  this  work.  Only  the 
formulae  for  vector  quantization,  in  particular  Eciuations 
(4«58a)  and  (4*81),  appear  ready  for  detailed 
cost/performance  analyses. 

Chapter  V  demonstrates  clearly  that  the  weighted  infor¬ 
mation  formulation  leads  to  reduced  estimation  error  as 
compared  to  the  more  common  noise  filtering  formulation. 
Examples  from  both  simulated  and  real  speech  are  provided. 
The  demonstration  relies  upon  the  reader's  visual  assessment 
of  scatter  plots;  thus  it  is  somewhat  qualitative.  A  more 
quantitative  assessment  (e.g.  a  compp.rison  of  empirical 
variance  to  theoretical  performance  bounds)  would  be  inter¬ 
esting;  however,  one  would  still  have  difficulty  evaluating 
the  significance  of  a  reduction  in  empirical  variance  to  the 
performance  of  a  particular  system.  Without  a  full  imple¬ 
mentation  one  must  rely  upon  experience  and  judgement  as 
well  as  the  available  experimental  evidence. 

Finally,  Chapter  VI  summarizes  the  results  of  this 
effort  and  provides  suggestions  as  to  how  this  work  may  be 
effectively  extended  and  refined. 


CHAPTER  II 


i 

GENERAL  DISCUSSION 

Spectral  estimation  is  a  problem  of  statistical  infer- 
I  ence  with  a  long  history  due  to  its  pervasive  importance  in 

scientific  applications  [l].  Modern  empirical  spectral 
analysis  began  to  take  shape  as  an  organized  discipline  with 
i  the  introduction  in  1893  of  the  periodogram  by  Schuster  [2]. 

Given  N  observations  (xn;  n=0, 1 , . . . ,N-1 }  of  a  time 
series  at  unit  time  intervals  the  periodogram,  f(e),  is 

i 

|  defined  as 

i 

f(e)  =  XN(eie)  XN(e”i0)/N  (2.1) 

i 

where 

N-l 

XN(z)  =  xn  z~n  ;  z  as  eie  (2.2) 

n=0 

Still  in  use  today,  the  periodogram  was  practically  the  sole 
computational  tool  of  empirical  spectral  analysis  until  Yule 
introduced  in  1927  his  method  of  autoregressive  (AR)  spec¬ 
tral  analysis  [3j* 

An  AR(P),  or  Pth  order  autoregressive,  model  spectrum, 

i.  L 

g(e) ,  is  characterized  by  a  model  gain,  <r  ,  and  a  monic  P 


S 


order  polynomial,  z**Ap(z),  and  ia  defined  by  them  as 


g(8)  =  <r2/  |  Ap ( e 10  )  |  2  (2.3) 

The  polynomial  may  be  characterized  by  a  variety  of  parame¬ 
ter  sets.  One  parameter  aet,  known  aa  predictor  coef- 

ficients  { a^ ;  n=1,2 . P},  definea  the  polynomial  according 

to 

P 

Ap(z)  =  ^  z_n  *  ao  =  1  (2.4) 

n=0 

In  contraat  to  Schuater'a  nonpar sunetric  method  of  apectral 
analy8is,  Yule'a  parametric  method  firat  introducea  the 
above  mathematical  model,  justified  by  phyaical  argumenta, 
and  then  uaea  the  available  data  to  estimate  the  model  pa¬ 
rameters.  These  estimates  are  provided  by  the  solution  to 
the  Yule-Walker  [4j  equations 

P 

^  ^  r|n-m|  ^  =  ®n  »  n=0,1,...,P  (2.5) 

m=0 

where 

N-n-1 

rn  =  ;  n=0, 1 , . . . ,  P  (2.6) 

m=0 

are  the  biased  sample  autocorrelation  lag  estimates. 


Ill 


Model  Selection 


A  variety  of  other  parametric  spectral  models  have  been 
introduced  and  studied  during  the  past  half  century;  several 
of  them  are  worth  noting.  The  moving-average  (MA)  model, 
like  the  AR  model,  is  characterized  by  a  polynomial  but 
differs  in  that  the  polynomial  appears  in  the  numerator;  the 
Schuster  periodogram  may  be  viewed  as  an  MA  model  spec¬ 
trum.  ^  Similarly,  ARMA  models  are  described  by  both  numer¬ 
ator  and  denominator  polynomials;  these  spectra  are  of 
particular  importance  in  engineering  applications  since  they 
characterize  all  stable  linear  systems  with  a  finite  dimen¬ 
sional  state  vector.  The  Blackman-Tukey  [5]  model  spectrum 
consists  of  a  finite  sum  of  cosine  terms;  it  is  obtained  by 
Fourier  [6]  transformation  of  the  product  of  the  autocorre¬ 
lation  sequence  and  a  finite  support  window.  The  Pisarenko 
[7]  model  consists  of  a  constant  plus  a  finite  number  of 
delta  functions.  Various  combinations  of  these  models  are 
also  occasionally  employed. 

Most  often  a  new  model  is  introduced  (together  with  a 
procedure  for  estimating  its  parameters)  simply  because  it 
seems  reasonable  relative  to  the  phenomenon  being  studied 
and  due  to  deficiencies  in  the  currently  popular 


Facts  such  as  these  tend  to  blur  the  distinction 
between  parametric  and  nonparametric  methods.  Since  any 
estimate  can  be  described  as  a  member  of  some  parametric 
family  once  it  has  been  derived,  the  distinction  may  be  seen 
as  one  of  spirit  rather  than  substance. 
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models. ^  More  recently  the  various  results  of  this  "un¬ 
scientific"  approach  have  been  "justified"  theoretically; 
this  justification  usually  takes  the  form  of  a  principle 
that  should  be  employed  as  a  guide  when  the  requirement  of 
consistency  with  the  available  information  leaves  several 
alternatives.  The  principle  is  usually  embodied  in  the  form 
of  a  functional  whose  extreme  value  is  to  be  found  while  the 
information  is  provided  in  the  form  of  constraint  equations 
(or  inequalities)  for  this  variational  problem. 

Much  of  the  current  literature  is  devoted  to  the  "prin¬ 
ciple  of  maximum  entropy"  which  was  enunciated  by  Jaynes 
[8,  9].  If  the  process  is  zero-mean  stationary  and  Gaus¬ 
sian^  it  is  completely  characterized  by  its  power  spectral 
density  function,  g (0),  (or  "spectrum"  for  short)  and  the 
process  entropy  is  expressed  in  terms  of  it  by 


Q  = 


IT 

In  g(e)  de/2ir 


(2.7) 


O 

We  shall  adopt  this  pragmatic  view  later  when  modeling 
speech  in  an  acoustically  noisy  environment. 

^Sometimes  a  model  is  used  in  spite  of  its  less 
reasonable  form  simply  because  the  available  parameter 
estimation  methods  yield  more  successful  overall  results. 
Thus  AR  models  are  employed  (instead  of  the  Pisarenko  model) 
to  estimate  the  frequencies  of  pure  sinusoids  in  white  noise 
from  short  data  records. 

^The  Gaussian  assumption  may  be  avoided  in  the  case  of 
correlation  constraints.  Working  directly  with  probability 
densities  the  Gaussian  form  may  be  derived  as  that  which 
maximizes  the  entropy  [10,  p.  944]* 
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As  demonstrated  by  Burg  [ll],  if  the  entropy  is  subsequently 
maximized  subject  to  correlation  constraints^ 


■  •  /: 


g(e)  e^n0  d0/2ir  ;  n=0,1,...,P 


(2.8) 


one  may  derive  the  AR(P)  form  for  g(8)  as  given  by  Equation 
(2.3).  The  AR(P)  form  together  with  the  constraint  Equa¬ 
tions  (2.8)  are  then  sufficient  to  yield  the  Yule-Walker 
Equations  (2.3)  from  which  the  model  parameters  may  be  de¬ 
termined.  If  cepstral  constraints^  are  employed  in  place  of 
correlation  constraints  the  spectrum  maximizing  Equation 
(2.7)  has  an  MA  form  while  both  correlation  and  cepstral 
constraints  lead  to  an  ARMA  model.  The  Pisarenko  model  is 
"justified"  by  deriving  it  as  the  minimum  energy  solution 
under  correlation  constraints^,  excepting  the  energy  (a  =  0) 
constraint  [12]. 

Another  principle  discussed  in  the  recent  literature  is 
the  "principle  of  minimum  cross-entropy"  [13].  Introduced 
by  Kullback  (under  the  name  "directed  divergence")  as  an 


^The  values  on  the  left-hand  side  are  given  in  terms  of 
the  data;  for  example,  by  Equation  (2.6). 

^These  place  constraints  directly  on  the  "cepstrum"  (or 
log  power  spectrum)  and  are  expressed  by  Equations  (2.8)  if 
g(d)  is  replaced  by  its  logarithm  while  the  left-hand  side 
values  are  expressed  in  terms  of  the  data. 

^It  may  also  be  related  to  the  maximum  entropy  prin¬ 
ciple  by  noting  that  the  AR(P)  model  approaches  the 
Pisarenko  model  as  r0  is  decreased  to  the  point  where  the 
correlation  matrix  becomes  singular  [7,  p.  355 3 * 


information  measure  [14],  it  has  a  number  of  interesting 
properties  neatly  collected  in  [15]*  In  terms  of  proba¬ 
bility  densities  the  cross-entropy  is  given  by 

S(q,p)  =  J  q(x)  ln[q( x)/p( x) ]  dx  (2.9) 

and  measures  the  expected  information  for  discrimina  ion® 
per  observation  from  q(x)  [14]»  A  symmetric  version  of  this 
measure,  S(q,p)  +  S(p,q),  was  introduced  earlier  by  Jeffreys 
[16]  who  emphasized  the  invariance  of  this  measure  with 
respect  to  coordinate  transformations;  unlike  entropy, 
cross-entropy  shares  this  important  property. 

As  an  inference  procedure,  minimum  cross-entropy  analy¬ 
sis  requires  a  prior  estimate  of  the  density,  p(x),  as  well 
as  new  information  in  the  form  of  constraints  and  derives  a 
new  posterior  estimate  of  the  density,  q(x),  by  minimizing 
S(q,p)  subject  to  the  constraints  [17]-  In  the  case  that 
the  prior  density  is  uniform  the  procedure  is  equivalent  to 
maximum  entropy;  with  correlation  constraints  the  posterior 
density  is  found  to  be  Gaussian  AR(P)  with  parameters  satis¬ 
fying  the  Yule-Walker  Equations  (2.5)» 

®Pully,  S(q,p)  is  said  to  measure  the  expected  informa¬ 
tion  for  discrimination  in  favor  of  the  (correct)  hypothesis 
that  the  density  is  q(x)  and  against  the  (competing)  hypoth¬ 
esis  that  the  density  is  p(x)  per  observation  from  q(x). 


Parameter  Estimation 


The  foregoing  discussion  leaves  the  impression  that  the 
correct  path  to  formation  of  a  spectral  estimate  is  clear: 
simply  select  a  guiding  principle  (undoubtedly  related  to 
the  notion  of  entropy),  gather  the  available  information, 
and  solve  the  well  defined  mathematical  problem  that  re¬ 
sults.  Seldom  is  the  practical  situation  so  simple. 

Typically  the  numerical  constraints  are  not  given  con¬ 
veniently,  say,  in  terms  of  exact  knowledge  of  the  autocor¬ 
relation  function  at  equally  spaced  lags.  More  often,  only 
a  few  irregularly  spaced  noise  corrupted  samples  of  the  time 
series  are  available;  from  this  data  the  numerical  con¬ 
straints  must  be  estimated.  Even  when  permitted  the  luxury 
of  bountiful  regularly  spaced  and  noise-free  data, numerous 
difficulties  remain.  Assuming  a  maximum  entropy  principle, 
should  estimates  of  the  autocorrelation,  cepstral,  or  some 
other  numerical  constraints  be  formed?  How  should  these 
estimates  be  formed  and  how  many^  of  them  should  be  formed? 

The  Yule  AR(P)  estimation  procedure  outlined  at  the 
beginning  of  this  chapter  provides  one  solution:  having 
selected  the  model  as  AR  and  its  order  as  P,  form  the  biased 
autocorrelation  lag  estimates,  Equation  (2.6),  and  use  these 


^This  is  the  problem  of  order  determination.  Various 
estimators  of  the  order  parameter,  based  upon  notions  of 
information  theory,  have  been  proposed  and  discussed  by 
Akaike  [18]  and  Pareen  [19.  20]  among  others.  Often  the 
order  parameter  is  selected  simply  upon  the  basis  of  experi¬ 
ence  with  the  phenomenon  under  study. 
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as  if  they  were  the  true  values.  These  autocorrelation  lags 
then  uniquely  determine  the  AR(P)  model  parameters  (and 
vice  versa)  via  the  Yule-Walker  Equations  (2.5).  This  de¬ 
scription  is  explicit  but  fails  to  provide  significant  in¬ 
sight  as  to  why  it  might  be  good.  The  formulation  may  be 
derived  from  a  variety  of  viewpoints,  each  with  its  own 
merit  and  yielding  greater  understanding  of  the  procedure. 

Linear  Prediction  (LP)  theory  leads  to  one  derivation 
of  this  formulation  [21 J.  In  this  derivation  the  AR  model 
is  viewed  as  a  predictor  and  the  model  parameters  are  deter¬ 
mined  to  minimize  the  prediction  error 

P 

en  =  xn  “  xn  =  xn  +  £  xn-m  (2.10) 

m=l 

in  a  mean-square  sense.  Depending  upon  the  details  of 
treatment  of  the  ends  of  the  data  record  one  may  derive  the 
Yule-Walker  procedure  (also  known  as  the  "autocorrelation  LP 
method")  or  a  variant  known  as  the  "covariance  LP  method". 
Both  of  these  methods  have  their  proponents.  The  Linear 
Prediction  theory  is  very  similar  to  Yule's  original  consid¬ 
erations  in  which  the  en  are  viewed  as  random  driving  dis¬ 
turbances  to  the  order  inhomogeneous  difference  Equation 
(2.10) . 

Other  variants  of  the  autocorrelation  LP  method  are 
based  upon  a  recursive  lattice  structure  Tor  the  prediction 
filter  [22].  In  addition  to  the  "forward"  predictor  Ap(z), 
these  variants  consider  a  "backward"  predictor,  Bp(z);  both 
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predictors  are  characterized  by  the  set  of  reflection  coef¬ 


ficients  |kn;  n=1 ,2, . . . ,Pj  according  to 


An( z)  =  An-1 (z)  +  kn 

z“1fin_1 (z)  ; 

A0( z)  =  1 

(2.11a) 

Bn(z)  =  z^B^  (z)  + 

^n  ^n-1 ( z  )  * 

B0(z)  =  1 

(2.1 1b) 

The  z-transform  of 

the  forward 

prediction 

error  process 

after  n  filtering  stages  is  simply  An(z)  X(z);  similarly  the 
z-transform  of  the  backward  prediction  error  process  is 
Bn(z)  X(z).  Mean-square  criteria  are  applied  to  the  forward 
and  backward  error  processes  to  obtain  a  variety  of 
estimators  for  the  reflection  coefficients;  one  of 
particular  importance,  due  to  Burg  [23],  determines  kn  to 
minimize  the  sum  of  the  variances  of  the  forward  and 
backward  error  processes  after  n  filtering  stages.  For 
truely  ergotic  processes,  all  these  AR  estimation  procedures 
are  asymptotically  equivalent  to  the  autocorrelation  LP 
method  for  large  values  of  N;  as  parameter  estimation 
procedures  these  methods  are  most  important  for  problems 
involving  mildly  nonstationary  data  of  limited  quantity. 

In  addition  to  these  various  "minimum  mean  square  pre¬ 
diction  error"  formulations,  another  important  derivation  of 
the  Yule  procedure  is  due  to  Itakura  and  Saito  [24].  As¬ 
suming  an  AR(P)  model  for  the  zero-mean  stationary  Gaussian 
process,  they  employ  the  maximum  likelihood  method  and  show 
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that  the  solution  is  obtained,  asymptotically  for  large  N, 
by  minimizing  a  "spectral  matching  criterion" 


Uf,g) 


lif(o)/g(o)]  -  m[f(e)/g(8)J  -  ij  de/2* 


(2.12) 


where  f(0)  is  the  Schuster  periodogram  given  by  Equation 

(2.1). 

It  is  readily  verified,  by  differentiating  I(f,g)  with 
respect  to  the  parameters  of  g(0),  that  the  minimum  is  ob¬ 
tained  when  the  correlation  matching  property 


f(e)  e*-n0  do/2ir  =  /  g(0)  e^n0  d0/2ir 


(2.13) 


is  satisfied  for  n=0,1,...P.  By  recognizing  the  left-hand 
side  as  the  lag  product  autocorrelation  estimates 


rn  =  /  f(0)  eine  d0/2w  (2.14) 

—IT 

the  correlation  matching  property  leads  easily  to  the  Yule- 
Walker  Equations  (2.5) »  see  [25,  pp.  445-6].  Recently  Kay 
[26]  has  developed  another  variant  by  similarly  applying  the 
maximum  likelihood  method  to  zero-mean  stationary  Gaussian 
AR(P)  processes  but  eliminating  the  large  N  approximation; 
again  this  variant  treats  the  problem  of  limited  data. 

The  functional  (2.12),  although  it  is  usually  attrib¬ 
uted  to  Itakura  and  Saito  in  the  current  speech  literature, 
was  apparently  first  developed  by  Pinsker  [27].  Assuming 


only  that  the  two  processes  are  zero-mean  and  Gaussian, 
Pinsker  showed^® 

S(p,q)/N  =  I(p,q)/2  (2.15) 

This  theorem  provides  an  information  theoretic  interpreta¬ 
tion  of  the  Itakura-Saito  spectral  matching  criterion. 
Moreover,  from  a  functional  inference  point  of  view,  one 
might  derive  the  Yule-Walker  procedure  by  replacing  q  by  an 
assumed  AR(P)  spectral  model,  g(0),  replacing  p  by  a  rough 
spectral  estimate  provided  by  f(0),  and  then  minimizing 
I(f.g). 

The  last  derivation  should  be  contrasted  with  the  mini¬ 
mum  cross-entropy  development  discussed  earlier.  In  that 
formulation  the  AR(P)  form  was  derived  from  given  correla¬ 
tion  constraints  while  this  formulation  derives  the  cor¬ 
relation  constraints  from  the  given  AR(P)  form.  Both 
developments  employ  (different)  prior  estimates  and  minimize 
a  measure  of  information  divergence  between  the  prior  and 
posterior  estimates;  however,  the  information  divergence  is 
not  a  symmetric  measure  and  the  unknown  (posterior)  estimate 
appears  as  the  second  argument  in  the  current  formulation 


^The  notation  is  somewhat  abused  here.  On  the  left 
p  and  q  represent  the  joint  probability  densities  of  N  con¬ 
secutive  random  variables;  on  the  right  p  and  q  are  power 
spectral  density  functions. 
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while  it  appears  as  the  first  argument  in  the  minimum  cross- 
entropy  development.  Nonetheless,  the  resultant  procedures 
are  both  the  same  as  the  Yule  procedure.  in  the  next  chap¬ 
ter  a  variant  of  this  last  derivation  will  be  considered. 

Noise  Corruption 

The  problem  of  noise  corruption  to  the  observations 
pervades  estimation  problems.  Generally  all  useful  estima¬ 
tors  are  at  least  mildly  tolerant  of  noise  corruption  while 
their  performance  degrades  if  the  corruption  becomes  par¬ 
ticularly  severe.  The  most  common  problem  considered  is 
that  of  an  additive  independent  noise  process;  this  problem 
is  of  considerable  importance  in  practical  applications. 

Upon  initial  reflection,  the  problem  of  estimating  the 
parameters  of  both  the  noise  and  signal  processes  from  time 
series  observations  alone  may  seem  impossible.  Indeed,  the 
problem  of  determining  the  individual  variances  of  two  inde¬ 
pendent  additive  zero-mean  stationary  white  Gaussian  proces¬ 
ses  is  completely  confounded  regardless  of  the  quantity  of 
data  available.  However,  if  one  process  is  non-Gaussian, 
estimates  of  third  and  higher  order  statistics  can  be  useful 
in  estimating  these  lower  order  statistics.  Par zen  discus¬ 
ses  the  use  of  the  "bi spectrum"  to  estimate  the  spectrum  of 
a  non-Gaussian  process  in  additive  independent  white  Gaus¬ 
sian  noise  [28]. 


Vhen  both  processes  are  Gaussian  the  problem  is  not 
always  confounded.  Since  the  sum  of  two  additive 


independent  ARMA  processes  is  also  an  ARMA  process  one  might 
hope  to  find  estimators  for  the  parameters  of  the  two  addi¬ 
tive  processes  when  the  number  of  parameters  for  the  com¬ 
bined  process  is  not  exceeded  by  the  total  number  of 
parameters  of  the  two  processes.  For  example,  Pagano  [29 J 
discusses  the  problem  of  estimating  the  P  +  2  parameters  of 
additive  AR(P)  and  white  processes  by  first  estimating  the 
2P  +  1  parameters  of  a  single  equivalent  ARMA(P,P)  process 
and  then  using  these  2P  +  1  estimates  to  initialize  a  pro¬ 
cedure  for  estimating  the  originally  sought  P  +  2  parame¬ 
ters;  it  seems  critical  however  that  the  order  of  the  AR 
process  does  not  degenerate  (i.e.  is  actually  nonzero). 

This  latter  problem  is  fairly  close  in  spirit  to  the 
problem  considered  in  the  following  chapters.  There  the 
signal  and  noise  processes  are  additive,  independent,  and 
zero-mean  Gaussian;  moreover,  the  signal  process  is  AR(P). 
The  problem  may  seem  more  complex  because  the  noise  process 
need  not  be  white;  however,  a  considerable  simplification  is 
achieved  because  the  noise  process  spectral  density  (hence, 
all  its  statistics)  is  assumed  to  be  known  in  addition  to 
the  time  series  observations.  In  practice  the  noise  statis¬ 
tics  are  estimates  provided  by  other  observations  but  the 
large  amount  of  data  available  for  these  estimates  makes 
them  quite  reliable. 


Noiae  Filtering 

Wiener  [30]  considered  the  intimately  related  problem 
of  extrapolating  a  time  series  from  noise  corrupted  obser¬ 
vations.  When  the  zero-mean  signal  and  noise  processes  are 
additive  and  independent  with  known  power  spectral  density 
functions  (g(8)  and  p(e)  respectively)  then  the  minimum 
variance  linear  extrapolating  filter  is  the  Wiener  filter 
whose  frequency  response  characteristic  is 

h( e )  =  g(0)/[g(e)  +  n(0)]  (2.16) 


This  is  sometimes  referred  to  as  the  unrealizable  Wiener 
filter  since  it  is  noncausal;  the  corresponding  impulse 
response  function  extends  both  backward  and  forward  in  time 
to  infinity.  It  is  easy  to  show  that  the  variance  of  the 
extrapolation  can  only  be  reduced  to  zero  if  the  support  of 
the  signal  spectrum  has  a  null  (or  zero-measure)  inter¬ 
section  with  the  support  of  the  noise  spectrum;  in  this  case 
the  frequency  response,  H(e),  will  be  unity  on  the  support 
of  g(0)  and  zero  elsewhere.  Others,  most  notably  Kalman 
[31],  have  since  extended  and  refined  Wiener's  pioneering 
work. 

A  common  procedure  for  dealing  with  additive  noise  is 
to  first  form  a  realizable  estimate  of  the  Wiener  filter  (or 

A 

some  other  "optimal"  filter),  H(0),  and  apply  it  to  the 


noise  corrupted  observations.  The  resulting  data  are  then 
treated  as  noise-free  observations  of  the  signal  process  and 


standard  estimation  procedures  are  employed  to  obtain  an 
estimate  of  the  signal  spectrum.  When  the  noise  spectrum, 
n(0),  is  known  this  procedure  involves  some  mildly  circular 
reasoning  since  Equation  (2.16)  indicates  that  knowledge  of 
H(e)  is  equivalent  to  knowledge  of  g(0).^  Nonetheless,  this 
process  has  been  demonstrated  to  be  advantageous  in  speech 
analysis  and  other  applications;  a  survey  of  these  methods 
may  be  found  in  [32]. 

Much  recent  effort  [33-39]  has  concentrated  upon  imple- 
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mentation  structures  and  estimation  procedures  for  H(0); 
typically  these  procedures  employ  side  information  in  ad¬ 
dition  to  the  noise  corrupted  time  series  observations. 
Often  the  methods  are  nonlinear  and  time-varying  with  both 
theoretical  and  heuristic  foundations.  Regardless  of  the 
technique,  one  may  always  subsequently  define  a  short-time- 
invariant  linear  equivalent  frequency  response  character¬ 
istic  in  terms  of  the  short-time  input  and  output  signal 
z-transforms,  X(z)  and  Y(z),  by 

H(0)  =  Y(ei0)/X(e10)  (2.17) 


Hence  we  would  have  g  =  ^H/(1-H).  The  conceptual 
difficulties  may  be  circumvented  by  considering  the  overall 
noise  cancelling  f  ilter/spectral  estimation  scheme  as  a 
single  estimation  procedure;  especially  since  the  procedure 
usually  does  not  employ  (2.16)  to  form  the  final  estimate  of 
the  signal  spectrum. 


One  convenient  categorization  distinguishes  frequency 
domain  methods  [33-36]  from  time  domain  methods  [37-39]* 
Among  the  frequency  domain  methods,  the  noise  cancelling 
filter  frequency  response  characteristic  usually  appears 
explicitly;  the  simpler  (and  less  heuristic)  methods  present 

A 

H(e)  as  a  function  of  the  short-time  signal  to  noise  spec¬ 
tral  density  ratio  estimate^ 


SNR( 0; a)  =  { [f (0) ]“  *  [n(e)]*i/[n(6)]a  (2.18) 


Two  important  classes  of  filter  response  characteristics  are 
the  subtraction  class  given  by^3 


H^eja.P)  =  [SNR(  0; <*) / [  1  +  SNR(0;«)]}P  (2.19) 


and  the  soft  suppression  class  given  by 


H  2(e;«,P)  =  {[1  +  H1(0;a,1/2)]/2}{®(0;o’,(3)/[l  +*(0;«,p)]} 

(2.20a) 


^Equation  (2.18)  employs  the  monus  function,  defined 
by  x-**y  =  (x-y  +  |x-y|  )/2,  to  insure  a  nonnegative  result. 

^Various  special  frequency  response  characteristics 
are  worth  separate  mention  here.  The  Wiener  filter  [30] 
frequency  response  is  fi1(0;1,l).  The  power  subtraction 
filter  and  the  magnitude  subtraction  filter  [35]  have  fre¬ 
quency  response  characteristics  1^(0;  1,1/2)  and  £^(0;  1/2,1) 
respectively.  Finally,  the  soft  suppression  class  due  to 
McAulay  and  Malpass  [36]  has  the  frequency  response 
$2 ( ® 5  1  » P)  • 
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where 


*(0;»,p)  =  exp[-pj  I0[2  Jp[l  +  SNR(  0;  a,i  ]  ] 


(2.20b) 


and  I0[-]  denotes  the  zeroth  order  modified  Bessel  function 
of  the  first  kind.  These  "suppression  rules"  are  plotted 
for  selected  values  of  a  and  0  as  a  function  of  SNR(0)  in 

Figure  1  . 


Effect  on  Resolution 


In  speech  applications,  vocal  tract  resonances  are  not 
extremely  sharp  and  are  moderately  well  separated  in  fre¬ 
quency;  consequently  one  is  generally  concerned  with  ac¬ 
curate  estimation  of  the  spectral  shape  and  high  resolution 
estimation  is  not  a  priority.1^  In  other  applications  (such 
as  sonar,  radar,  and  medicine)  accurate  frequency  estimation 
and  resolution  of  discrete  ("line")  and  narrowband  spectra 
are  issues  of  fundamental  importance.  Periodogram  and 
Blackman-Tukey  spectral  estimates  have  a  fundamental  fre¬ 
quency  resolution  limit  determined  by  the  length  of  the 
observation  interval;  AR  estimators  have  become  quite  popu¬ 
lar  due,  in  part,  to  their  greatly  improved  resolving  power. 


1  4., 

^Hence,  even  very  low  resolution  methods  that  divide 
the  (  4  kHz)  voice  bandwidth  into  fewer  than  two  dozen 
"channels"  can  be  quite  effective. 
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Figure  1.  Noise  Filter  Characteristics 


Still,  the  resolution  (as  well  as  other  performance  indi¬ 
cators)  varies  among  the  different  AR  estimators  and,  for 
each,  is  influenced  by  a  variety  of  factors. 

Noise  corruption  is  one  of  the  important  factors 
limiting  the  resolving  power  of  AR  estimators.  Several 
authors  have  considered  the  problem  of  estimating  the  pa¬ 
rameters  of  a  fixed  number  of  sinusoids  from  discrete-time 
observations  corrupted  by  zero-mean  additive  white  Gaussian 
noise  of  unknown  variance.  For  this  specialized  problem  the 
Cramer-Rao  performance  bounds^  may  be  computed  [40 j.  As  is 
well  known,  the  complicated  nonlinear  maximum  likelihood 
estimation  procedure  will  achieve  these  bounds;  Tufts  and 
Kumaresan  [41 j,  using  AR  estimation  procedures,  have  de¬ 
veloped  computationally  simpler  high  resolution  frequency 
estimators  that  nearly  achieve  these  bounds  while  Cadzow, 
et.  al.  [42]  claim  still  better  performance  using  a  singular 
value  decomposition  (SVD)  approach.  In  many  practical  cir¬ 
cumstances  additional  information  may  be  available  so  that 


^In  general,  the  Cramer-Rao  bounds  indicate  the  mini¬ 
mum  variance  a  parameter  estimate  can  achieve  [45]-  An 
estimate  achieving  the  minimum  variance  is  an  "efficient" 
estimate.  In  [40]  the  bounds  upon  an  unbiased  frequency 
estimate  are  considered  (they  depend  upon  the  assumed 
distribution  as  well  as  the  number  of  data  points)  and  are 
presented  as  a  function  of  the  signal  to  noise  ratio.  In 
[44],  the  efficiency  loss  of  any  method  based  upon  the  use 
of  correlation  estimates  instead  of  the  original  data  is 
studied. 
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these  bounds  may  be  3X'**eded ;  ^  ^  for  example.  Quirk  and 
Liu  [45j  describe  a  simpxe  filtering  and  decimation  scheme 
(which  employs  knowledge  of  the  frequency  bands  in  which  the 
sinusoids  are  located)  that  improves  the  resolution  of  (any) 
subsequent  AR  estimator.  In  a  similar  vein,  adaptive  pre¬ 
filters  (that  employ  a  reference  process  correlated  with 
either  the  signal  or  noise  portion  of  the  objective  process, 
but  not  both)  have  been  devised  to  "enhance"  narrowband 
signals  in  noise  [46 J. 

Quantization  and  Computation 

While  spectral  estimation,  per  se,  is  not  concerned 
with  the  problems  of  quantization  and  computation,  the  ulti¬ 
mate  utility  of  an  estimation  procedure  can  depend  strongly 
upon  these  (and  other)  issues.  If  the  procedure  explicitly 
recognizes  that  only  one  of  a  finite  predefined  set  of 
conclusions  can  be  reached,  the  situation  is  sometimes  dis¬ 
tinguished  by  referring  to  the  "detection"  (instead  of  the 
"estimation")  problem. 

In  many  digital  speech  recognition  and  communication 
systems  the  goal  of  spectral  analysis  is  to  solve  a  detec¬ 
tion  problem;  in  addition,  the  system  designer  must  solve 
the  problem  of  selecting  the  best  finite  set  of  models  to 


16Mo  re  precisely,  the  true  bounds  are  reduced  by  the 
availability  of  additional  information.  Consequently  new 
estimators  that  account  for  this  additional  information  can 
be  devised  that  outperform  (in  terms  of  variance)  any  esti¬ 
mator  that  does  not  account  for  the  additional  information. 


employ.  Until  recently,  these  systems  would  find  the  solu¬ 
tion  to  an  estimation  problem  and  then  employ  a  (somewhat  ad 
hoc)  quantization  procedure  to  select  a  model  from  among  the 
finite  set.  If  the  number  of  models  in  the  finite  set  was 
sufficiently  large,  this  procedure  could  be  quite  effective; 
however,  one  measure  of  goodness  for  the  finite  set  of 
models  is  often  how  few  models  are  in  the  set. 

In  the  past  decade  technological  advances  have  permit¬ 
ted  the  use  of  increasingly  complex  computational  procedures 
while  still  meeting  size/cost/power  constraints  imposed  by 
the  application.  Consequently  more  sophisticated  and  ef¬ 
fective  (but  previously  unmanagable)  techniques  for  esti¬ 
mation/detection  and  quantization  of  spectral  models  have 
been  studied  in  earnest.  The  numerous  variants  of  a  class 
of  techniques  generally  referred  to  as  "vector  quantization" 
[47-53]  have  recently  achieved  considerable  success  by  re¬ 
ducing  the  finite  number  of  models  by  about  9  orders  of 
magnitude  with  only  slight  degradation  in  other  measures  of 
system  performance. 

Many  of  these  vector  quantization  techniques  are 
founded  upon  minimization  of  the  asymptotic  information 
divergence  I(f,g).  Of  considerable  interest  in  the  use  of 
this  measure  is  the  triangle  equality  property;  if  g(8) 
minimizes  I(f,g)  over  the  set  of  all  stable  AR(P)  models  and 
h(0)  is  any  other  model  in  a  (possibly  finite)  subset  then 

I(f,h)  =  I(f,g)  +  I(g,h)  (2.21 ) 
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As  a  consequence  of  this  property  one  may  solve  the  detec¬ 
tion  problem,  which  minimizes  l(f,h),  by  first  solving  the 
estimation  problem,  which  minimizes  I(f,g),  and  then  solving 
the  quantization  problem  which  minimizes  I(g,h). 

Remarks 

The  general  problem  of  spectral  estimation  has  been 
discussed;  this  discussion  has  emphasized  issues  and  methods 
associated  with  autoregressive  estimation.  Autoregressive 
spectral  models  are  important  in  numerous  practical  applica¬ 
tions;  consequently  they  have  received  considerable  at¬ 
tention  in  the  literature.  The  AR  form  may  be  derived  from 
either  the  maximum  entropy  or  the  minimum  cross-entropy 
principle  when  correlation  constraints  are  considered;  al¬ 
ternatively  the  AR  form  may  be  assumed  and  correlation  con¬ 
straints  derived  using  a  linear  prediction  formulation.  The 
correlation  constraints,  together  with  the  AR  form,  are 
sufficient  to  derive  the  Yule-Walker  equations  which  relate 
the  model  parameters  to  the  prescribed  correlation  values. 

The  asymptotic  maximum  likelihood  formulation  of 
Itakura  and  Saito  assumes  an  AR  form  and  derives  the  corre¬ 
lation  constraints;  in  the  course  of  this  development  a 
"spectral  matching  criterion”  is  minimized.  The  earlier 
derivation  by  Pinsker  of  this  spectral  matching  criterion 
from  an  asymptotic  information  divergence  formulation  makes 
clear  that,  while  the  AR  form  is  necessary  to  derive  the 
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Yule-Walker  equations,  the  spectral  matching  criterion  is 
applicable  independant  of  the  spectral  model  form. 

Noise  corruption  pervades  estimation  problems  and  use¬ 
ful  estimators  are  generally  at  least  mildly  tolerant  of 
additive  noise.  Often  additional  data  is  available  to  help 
characterize  or  distinguish  the  noise  and  signal  processes; 
many  estimation  problems  are  concerned  with  the  development 
of  effective  and  computationally  feasible  methods  for  in¬ 
corporating  this  additional  information.  A  common  pro¬ 
cedure,  employed  when  an  accurate  noise  spectrum  estimate  is 
known,  first  applies  an  estimated  noise  cancelling  filter  to 
the  corrupted  data  and  then  uses  the  output  as  "noise-free" 
data  from  which  to  estimate  the  signal  spectrum.  Ultimately 
the  effect  of  noise  corruption  will  be  to  decrease  the  best 
performance  possible  with  any  spectral  estimator. 

In  the  following  chapters  a  new  spectral  estimator  is 
developed.  As  is  common,  the  fundamental  observations  are 
assumed  to  be  equally  spaced  samples  of  a  zero-mean  station¬ 
ary  Gaussian  time  series  corrupted  by  additive  independent 
zero-mean  stationary  Gaussian  noise  of  known  power  spectral 
density,  p(0).  This  problem  occurs  in  many  applications 
involving  speech  analysis  (as  well  as  others)  wherein  the 
noise  spectrum  is  estimated  from  data  taken  during  speech 
inactivity. 

The  amount  of  data  available  to  estimate  the  signal 
spectrum  is  usually  limited  by  the  nonstationary  character 
of  speech;  the  speech  statistics  are  usually  stationary  only 


over  very  short  time  intervals  varying  in  duration.  One 
study  [54]  has  observed  speech  waveforms  and  subjectively 
judged  that  the  duration  for  which  a  segment  may  be  con¬ 
sidered  stationary  varies  from  about  4  ms.  to  over  360  ms. 
with  most  of  the  distribution  contained  in  the  range  of  12 
ms.  to  174  ms.;  most  speech  analysis  systems  employ  a  fixed 
analysis  interval  approximately  20  to  25  ms.  in  duration. 
The  use  of  a  fixed  analysis  interval  (with  no  particular 
attempt  at  optimum  time  alignment  of  end  points)  is  simply  a 
practical  method  of  limiting  the  computational  burden;  while 
suboptimal  spectral  estimates  are  thereby  achieved  for  long 
acoustic  events,  perhaps  the  most  severe  deleterious  effect 
is  the  slurring  of  very  short  events  and  transitions. 

In  order  to  employ  at  a  later  time  a  noise  estimate 
obtained  during  speech  inactivity,  the  noise  statistics  are 
assumed  to  remain  stationary  over  much  longer  time  inter¬ 
vals;  since  one  of  the  primary  noise  sources  is  ambient 
environmental  noise  acoustically  coupled  to  the  speech,  the 
validity  of  this  assumption  must  be  checked  in  each  situ¬ 
ation.  In  many  practical  circumstances  the  noise  is 
stationary  over  long  intervals;  for  example,  in  aircraft, 
the  noise  statistics  typically  vary  only  with  the  flight 
condition.  On  the  other  hand,  if  the  corrupting  noise  is 
another  speech  signal  the  assumption  of  long  term  noise 
stationarity  is  certainly  invalid. 


CHAPTER  III 


THEORETICAL  FORMULATION 


In  this  chapter  several  related  procedures  for  esti¬ 
mating  AR(P)  process  parameters  from  noise  corrupted  time 
series  observations  are  developed.  In  the  first  section  the 
problem  is  motivated  as  one  arising  in  speech  applica¬ 
tions.  In  the  next  section  an  ideal  formulation  is  discus¬ 
sed;  unfortunately  the  resulting  nonlinear  system  of 
equations  is  sufficiently  complicated  to  make  analytical 
solution  intractable.1  In  the  third  section  a  first  ap¬ 
proximation  to  the  ideal  formulation  is  developed  and  shown 
to  be  essentially  equivalent  to  the  noise  filtering  pro¬ 
cedures  discussed  in  Chapter  II.  In  the  fourth  section  a 
second,  improved,  approximate  formulation  employing  a 
weighted  information  measure  is  developed; ^  some  important 


Numerical  solution  may  be  feasible  in  some  cases  but 
this  is  not  investigated  in  the  present  work. 

p 

cThis  weighted  information  formulation  assumes  a  cen¬ 
tral  role  in  this  work.  In  fact,  this  was  the  original 
foundation  and  was  developed  heuristically  following  the 
work  of  Chu  and  Messerschmitt  [55»  56].  The  theoretical 
foundation  (as  an  approximation  to  the  "ideal"  formulation) 
was  subsequently  developed  because  the  heuristic  development 
could  only  specify  the  weight  function  qualitatively  and  a 
more  quantitative  characterization  was  required. 


properties  of  the  weighted  information  measure  are  derived 
in  the  fifth  section.  Finally,  the  last  section  reflects 
upon  these  formulations,  their  relationship  to  other  estima¬ 
tion  procedures,  and  problems  of  spectral  estimation  and 
speech  analysis  to  which  they  may  be  applied. 

Application  to  Speech  Analysis 

Acoustic  events  in  speech  are  often  modeled  a3  a  white 
zero-mean  Gaussian  stationary  excitation  of  a  linear  system. 
The  linear  system  response  is  usually  identified  with  the 
vocal  cavity  response  which  depends  upon  the  position  of 
speech  articulators  (tongue,  lips,  teeth,  etc.);  the  exci¬ 
tation  is  usually  assumed  to  be  physically  localized  al¬ 
though  its  position  may  vary  with  different  speech  events. 

The  linear  system  model  may  be  criticized  in  various 
ways;  still  it  has  had  considerable  success  in  practical 
situations.  The  particular  case  of  an  AR  (or  all-pole 
linear)  system  model  can  be  justified  on  the  basis  of  a 
lossless  acoustic  tube  of  varying  cross-sectional  area.  The 
analogy  of  an  acoustic  tube  with  the  oral  or  nasal  cavity 
alone  is  clear;  however,  some  speech  sounds  reflect  the 
combined  response  characteristics  of  the  oral  and  nasal 
cavities  indicating  that  a  full  ARMA  model  would  be  more 
appropriate.  A  more  complete  discussion  of  acoustic  tube 
modeling  of  the  vocal  tract  may  be  found  in  [21]. 

Based  upon  the  considerable  success  of  AR  models  in 
speech  applications,  as  well  as  the  physical  analogies  that 
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may  be  drawn  between  AR  models  and  the  vocal  tract  via 
acoustic  tube  modeling,  the  AR  speech  model  is  adopted  here. 
In  most  applications  the  deleterious  effects  of  the  pressure 
transducer,  analog  amplifier,  anti-aliasing  prefilter,  and 
the  digitizer  have  been  carefully  minimized  and  may  be  ig¬ 
nored.  Some  applications  permit  the  system  designer  to 
ensure  that  the  pressure  transducer  response  reflect  only 
the  speech  of  the  intended  speaker;  more  often,  conflicting 
goals  deny  the  designer  this  flexibility  so  that  the  micro¬ 
phone  transduces  other  ambient  environmental  acoustic  events 
that  appear  as  unwanted  "noise"  in  the  observed  signal. 
Consequently,  while  the  AR  model  is  adopted  for  the  speech 
spectrum,  it  is  inadequate  as  a  model  for  the  observed  sig¬ 
nal  spectrum. 

Some  ambient  noise  is  a  direct  environmental  response 
to  the  speech  itself  (e.g.  echoes)  or  is  short,  transient, 
and  generally  unpredictable  by  nature  (e.g.  a  gunshot, 
dropped  book,  engine  backfire,  cough,  etc).  Other  ambient 
noise  is  repetitive  (e.g.  machine-gun  fire)  or  steady  by 
nature  (e.g.  drone  of  engines,  rushing  air,  running  water, 
whine  of  a  turbine).  This  last  (steady)  type  of  noise  is 
the  primary  focus  of  many  speech  analysis  systems;  typically 
these  systems  exploit  the  steady  nature  of  the  noise  to 
determine  noise  statistics  during  speech  activity  from  sig¬ 
nal  observations  made  during  speech  inactivity.  With  multi¬ 
ple  transducers  (or  other  clever  system  design  techniques) 
the  statistics  of  a  much  broader  class  of  noises  may  be 


known  during  speech  activity.  In  the  following  it  is  only 
assumed  that,  during  each  analysis  interval,  the  noise  in 
the  primary  (objective)  observation  signal  be  zero-mean 
Gaussian  stationary  additive  and  independent  of  the  speech; 
the  noise  is,  therefore,  completely  characterized  by  a  spec¬ 
tral  density  function,  n(0),  which  is  assumed  to  be  known. 

The  goals  of  speech  analysis  are  many  and  varied.  In 
communications  the  goal  is  often  to  achieve  a  minimal  data 
rate  subject  to  a  quality  or  communicability  constraint.  In 
artificial  intelligence  the  goal  is  usually  to  "understand" 
the  speech  with  phonetic  or  written  transcription  often 
arising  as  an  intermediate  step.  Some  other  goals  include 
the  identification  of  the  speaker,  the  identification  of  the 
language,  translation  of  the  voice  of  one  speaker  to  that  of 
another  in  the  same  or  a  different  language,  and  the 
screening/diagnosis  of  disease  (e.g.  laryngeal  cancer). 
Spectral  estimation  is  at  the  foundation  of  speech  analysis 
for  all  these  goals  and  accurate  AR  model  estimation  in 
noise  is  fundamental  to  the  estimation  of  speech  spectra  in 
practical  environments. 

Ideal  Formulation 

Let 

h(e)  =  g(e)  +  M>(  0 )  (3.1) 
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be  the  observed  process  power  spectral  density  model  where 
jx(8)  is  the  known  additive  noise  process  spectrum  and  g(e) 
is  the  unknown  AR(P)  power  spectral  density  model  character¬ 
izing  the  signal  process;  see  Equation  (2.3)*  bet  f(8)  be 
the  Schuster  periodogram  defined  for  the  N  time  series  ob¬ 
servations  by  Equation  (2.t).  If  the  signal  and  noise  pro¬ 
cesses  are  independent  zero-mean  real  stationary  Gaussian 
processes  then  the  maximum  likelihood  method  is  asymp¬ 
totically  equivalent,  for  large  N,  to  minimizing  I(f,h)  with 
respect  to  the  AR(P)  process  parameters.  Any  parameter  set 
minimizing  I(f,h)  and  corresponding  to  a  stable  AR(P)  pro¬ 
cess  shall  be  considered  here  to  be  an  ideal  solution  to  the 
estimation  problem. 

This  formulation  of  the  estimation  problem  as  a  minimi¬ 
zation  problem  may  also  be  derived  from  an  information  theo¬ 
retic  viewpoint.  Let  F(0)  be  the  true  observed  process 
power  spectral  density  so  that  I(f,h)  represents  the  asymp¬ 
totic  information  divergence  between  the  true  spectrum  and 
an  arbitrary  model  spectrum.  Clearly  it  is  desirable  to 
find  the  model  h(0)  minimizing  I(f,h);  if  the  minimum  value 
is  zero  then  h(0)  =  f(0)  almost  everywhere.  Since  f(0)  is 
unavailable,  replace  it  by  a  rough  estimate,  f(0),  and  find 
h(0)  to  minimize  I(f,h). 

Minimization  of  I(f,h)  is  subject  to  several  inter¬ 
esting  interpretations;  the  maximum  likelihood  and  minimum 
information  divergence  interpretations  have  been  given 
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above,  a  third  noise  filtering  interpretation  is  now  pro¬ 
vided.  Notice  that  I(f,h)  =  I(Hf,g)  where  H(0)  is  the  fre¬ 
quency  response  of  the  Wiener  filter  given  in  Equations 
(2.16).  The  quantity  H(e)f(0)  may  be  interpreted  as  a  rough 
estimate  of  the  spectrum  of  a  process  obtained  by  passing 
the  observed  process  through  a  filter  whose  power^  spectral 
response  characteristic  is  H(0);  minimization  of  I(f,h)  = 
I(Hf,g)  may  then  be  understood  as  a  standard  LP  (or  maximum 
entropy,  etc.)  fit  to  the  noise  filtered  process.  Of 
course,  H(0)  is  not  known  but  is  a  function  of  the  unknown 
parameters  of  g(0);  one  must  simply  imagine  finding  a  pa¬ 
rameter  set  defining  H(0)  that  also  corresponds  to  the  best 
LP  fit,  g(0),  to  the  output  process. 

The  functional  I(f,h)  is  minimized  by  computing  its 
derivative  with  respect  to  each  parameter  of  g(o)  and  set¬ 
ting  the  result  to  zero.  For  an  arbitrary  parameter, £  , 
this  is 

J{[ H(e)  g(0)  -  H2(0)  f(0)]/g2(0)l  (0g(0)/a£)  de/2ir  =  0 

(3.2) 


^This  is  not  to  say  that  the  observed  process  is  passed 
through  a  Wiener  filter  whose  frequency  response  is  H(0). 
Recall  that  the  Wiener  filter  Is  designed  to  minimize  the 
mean-square  prediction  error;  the  output  process  doing  this 
does  not  have  the  signal  process  spectrum,  g(0),  but  instead 
the  spectrum  H(0)g(0).  Alternatively,  H(0)f(e)  may  be 
interpreted  as  a  rough  estimate  of  the  cross-spectrum 
between  the  input  and  output  processes  of  the  Wiener  filter. 
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Using  Eaua-.i  ons  (2.5)  and  (2.4),  the  partial  derivatives  of 
g(e)  are 


ag(e)/8<r2  =  g(e)/<r2 


(5* 5a) 


and,  for  1=1, 2 . P 


P 

9g(e)/3ai  =  -g2(e)  2am  cos[  ( l-m)0  ]/<r2 

m=0 

Defining^ 


V 


n 


(0)  f(0)  -  H(0) 


g(0)J  ein®  d0/2n 


(3-3b) 


(3.4) 


and  substituting  Equation  (3*3)  in  Equation  (3-2)  yields 


P 


P 


L 


(a,/'2)  Vt_m 


0 


(3-5a) 


and,  for  1  =  1 ,2, ...  ,P 

P 

X]  (am/<r2>  vi-rn  =  0  (3.5b) 

m=0 


while  a  little  further  manipulation  of  Equations  (3-5) 


^It  is  worth  noting  that  the  quantities,  Vn,  defined  by 
Equations  ( 3 - 4 )  are  the  components  of  the  gradient  vector  of 
I(Hf,g)  where  differentiation  is  defined  with  respect  to  the 
inverse  correlation  parametrisation  of  g(0);  see  Equation 

l  "X.  o  \  ^ 
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yields,  for  1=0,1, ...,P 


P 

E  a*  vi-m  =  0  (5,6) 

m=0 

The  symmetry  of  the  functions  f(0),  g(0),  and  H(0J  may  be 

used  to  demonstrate  that  V_n  =  Vn  while  it  is  easy  to  see 
that  Equations  (3-6)  are  satisfied  if 


Vn  —  0  ;  n— 0 , 1 ,  •  •  •  ,  P 


(5-7) 


To  show  that  Equations  (3*7)  must  be  satisfied  if  a 
stable  filter  is  to  be  obtained,  rewrite  the  system  of 
Equations  (3*6)  in  matrix  form  as 


• 

r  *i 

- 

-  - 

1  a^  •  •  •  ap— i  ap 

0  0  •••  0  0 

V0 

0 

ai  a2  •  • •  ap  0 

0  1  ...  0  0 

•  •  0  • 

V 

0 

• 

< 

.  •  •  • 

.  .  •  • 

. 

+ 

•  •  •  • 

•  •  •  • 

► 

• 

• 

= 

• 

• 

ap_'j  ap  •  •  •  0  0 

0  ap_ 2  •  •  •  1  0 

Vp-i 

0 

p 

►0 

o 

• 

• 

• 

o 

o 

1 _ 

0  ap— 1  •  •  •  ai  1 

_Vp  _ 

0 

-w 

J 

(3-8 

The  coefficients  of  a  stable  P-18^  order  predictor  (an;  n  = 
1,2,..., P-1 |  are  given  recursively  in  terms  of  a  stable  Pth 
order  predictor  according  to 


[I  +  kp  J]"1 


(3-9) 


where  1  ia  the  identity  matrix,  J  ia  the  reveraal  matrix 


0  0  •••  0  1 

0  0  •••  1  0 


•  • 

•  • 


(3-10) 


0  1  •••  0  0 

1  0  •••  0  0 


kp  =  ap  is  a  reflection  coefficient^  and 


[I  +  kp  J]"1  =  [I  -  kp  J]/(1  -  kp2) 


(3.11) 


Applying  the  nonsingular  transformation^  [I  +  kp  to 

Equation  (3-8)  does  not  change  the  solution  and  yields 


4 

1  ••• 

/X  A 

aj  &2  *'• 

•  • 

•  • 

•  • 

A  _ 

ap-i  o  ... 

0  0 


ap_1  0 
0  0 


0  ...  0  0 

1  ...  0  0 

•  •  • 
•  •  • 
•  •  • 

ap_2  •••  1  0 

A  A 

ap_^  •  • »  a^  I 


vp-i 

Vp 


(3.12) 


^These  are  the  same  reflection  coefficients  used  in  the 
forward-backward  recursion;  see  Equation  (2.11). 

^Bounded  input,  bounded  output  (BIBO)  stability  re¬ 
quires  and  is  guaranteed  by  the  condition  |kn|  <  1  for  n  = 
f,2,...,P  which  also  guarantees  that  the  indicated 
transformation  is  nonsingular. 


The  last  equation  shows  Vp  to  be  a  linear  combination  of 
V0 • Vi » • • • ♦ Vp-i  and  the  reduced  system 


r 


1  a^  •••  ap_j 

A  A  _ 

a  2  •••  u 

•  •  • 

•  •  * 

•  •  • 

+ 

_*P-1  0  •••  0 
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Vi 
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Vp-1. 
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J 

(3 
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is  of  the  same  form  as  Equation  (3*8)*  Consequently,  sta¬ 
bility  requires  that  each  Vn  be  a  linear  combination  of  the 
previous  V|  ,  1=0, 1 , . . . ,n-1 ,  while  the  final  reduced  system 
is  simply  VQ  =  0.  Hence,  if  only  stable  minima  of  I(f,h) 
are  sought  these  minima  must  satisfy  Equations  (3*7)  which 
may  be  rewritten,  for  n=0,1,...,P,  as 


t 


■l: 


H(0)  H(0)  f ( 0 )  ein®  de/2w  =  /  H(0)  g(0)  ein0  de/2ir  (3-14) 


This  is  a  highly  complicated  nonlinear  system  of  equa¬ 
tions  that  appears  to  be  very  difficult  to  solve  analyti¬ 
cally.  Note  that,  in  the  absence  of  noise,  jx(e)  =0  and 
H(0)  =  1  so  that  the  system  reduces  to  Equations  (2.13)  as 
expected;  in  this  case  it  is  well  known  that  the  system 
always  possesses  a  unique  stable  solution. 


In  general  no  admissable  solution  exists;  the  following 
example  will  serve  to  illustrate.  Consider  an  AR(0)  process 
corrupted  by  white  noise  of  known  variance  p.  The  system  of 


equations  reduces  to 


ir 

f{0)  de/2ir  =  <r2  +  H  (3.15) 

If  r0  2_  ^  the  system  is  solved  by  <r2  =  rc  -  k  which  yields 
the  minimum  value  I(f,h)  =  I(f,r0)  =0.  If  r0  <  n  the 
system  does  not  possess  a  real  solution;  however,  I(f,h)  is 
always  minimized  by  selecting  cr2  =  rQ  i  (i. 

Noise  Filtering  Formulation 

Since  Equations  (3*14)  appear  so  difficult  to  solve,  it 
is  natural  to  consider  alternate  formulations.  From  the 
observation  that  l(f,h)  =  I(Hf,g)  and  the  interpretation  of 
H(e)  as  the  power  spectral  response  of  a  noise  filter  a 
simple  and  reasonable  procedure  is  to  replace  H(e),  which 

/v 

depends  upon  unknowns,  by  an  estimate  H(e).  Several  classes 
of  estimates  have  been  presented  in  Equations  (2.19)  and 
(2.20). 

Once  the  data  has  been  processed  by  the  filter  with 
power  response  H(e)  a  "noise-free"  rough  estimate  is  avail¬ 
able 


f(e)  =  H( 0)  f(0)  (3.16) 

Then,  minimization  of  I(f,g)  =  I(Hf,g)  is  achieved  by  the 
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solution  to  the  equations 


fi 


It 

H(0)  f(6)  ein0  de/2ir  =  /  g(0)  ein0  de/2ir 

~nf or  n=0, 1 , . . . ,  P 


(5.17) 


This,  of  course,  leads  easily  to  the  Yule-Walker  equations 
with  the  difference  that  the  estimated  correlation  values 
are  now  given  by  the  left-hand  side  of  Eauation  (3*17);  the 
reader  is  urged  to  compare  this  equation  with  Equations 
(3.14)  and  (2.13). 

Weighted  Information  Formulation 

The  previous  approximate  formulation  encompasses  a  wide 
variety  of  estimation  procedures  that  have  been  studied  in 
recent  years.  If  f(0),  given  by  Equation  (3. 1 6) ,  is  a  good 
rough  estimate  of  the  noise-free  power  spectral  density  the 
resultant  model  parameters  can  be  expected  to  be  accurate. 
Consequently,  considerable  effort  has  been  expended  trying 

A 

to  find  the  best  form  of  H(0)  and,  ultimately,  the  best 
means  of  computing  the  correlation  values  on  the  left  hand 
side  of  Equation  (3.17). 

Generally  speaking,  any  estimate  can  be  expected  to  be 
more  accurate  if  there  is  less  corrupting  noise;  in  particu- 

a 

lar,  f(0)  can  be  expected  to  be  more  accurate  in  those  spec¬ 


tral  regions  where  the  signal  to  noise  density  ratio  is 

/A 

large.  Since  the  reliability  of  the  rough  estimate  f(0) 
varies  with  frequency,  the  criteria  for  fitting  a  model  to 


f(e)  should  reflect  this  variation  in  reliability.  The 
frequency  weighted  spectral  distance  measure  introduced  by 
Chu  and  Messerschmitt  [55*  56]  provides  precisely  the  re¬ 

quired  flexibility  for  such  a  criteria.  The  criteria  is 
derived  from  the  asymptotic  information  divergence,  I(f,g), 
by  noting  that  the  integrand  in  Equation  (2.17)  is  a  non¬ 
negative  error  measure;  the  frequency  weighted  variant  is 
obtained  by  introducing  a  multiplicative  nonnegative  weight 
function  to  the  integrand  of  I(f,g)  to  yield 

IT 

w(e){[f(e)/g(e)]  -  in[f(e)/g(e) ]  -  i|  de/2ir 
«■  (3.18) 

If  W(e)  is  constant,  minimization  of  Iy(f,g)  =  Iy(Hf,g) 
is  equivalent  to  minimization  of  I(f,g)  =  I(Hf,g).  To  re- 

A  , 

fleet  the  greater  reliability  of  f(e)  in  some  spectral  re¬ 
gions,  W(e)  should  be  selected  to  be  large  where  the  signal 
to  noise  density  ratio  is  large.  To  remain  consistent  with 
AR  estimation  procedures  that  work  well  in  the  absence  of 
noise,  H(0)  should  approach  unity  and  W(0)  should  approach  a 
constant  as  **(9)  approaches  zero.  Specific  procedures  for 
selecting  H(0)  have  been  studied  in  the  past  [32-39]  and 
important  examples  are  given  in  Equations  (2.19)  and  (2.20); 
the  above  considerations  provide  a  qualitative  understanding 
of  an  appropriate  selection  for  W(0)  but  a  more  specific, 
quantitative  understanding  is  required. 


r(f.«)  = 


To  minimize  Iy(Hf,g)  Equation  (3*18)  is  differentiated 
with  respect  to  the  parameters  of  g(0)  and  the  results  are 
set  to  zero.  The  procedure  is  the  same,  mutatis  mutandis, 
as  that  followed  for  minimizing  I(Hf,g)  and  yields  the 
system  of  equations 


I: 


tr  /•" 

w(0)  H(9)  f(e)  ein6  de/2ff  =  /  w(e)  g(0)  ein0  de/2TT 

-ir 


(3.19) 


Comparison  of  Equations  (3*19)  to  Equations  (3-14),  which 
result  from  the  ideal  formulation,  immediately  suggests  the 
required  quantitative  criteria  for  selecting  W(e).  Specifi¬ 
cally,  W(e)  should  be  selected  so  that,  at  least  approxi¬ 
mately, 


V(0)  =  H(0)  (3.20) 

and  H(8)  should  estimate  H(e).  This  selection  is  supported 
by  the  previous  heuristic  considerations  which  indicated 
that  W(e)  should  be  large  where  the  signal  to  noise  density 
ratio  is  large. 

Properties  of  the  Weighted  Information 

In  this  section  three  important  results  concerning  the 
weighted  information  measure,  Iy(f,g),  are  developed.  These 
results  also  apply  to  the  asymptotic  information  divergence, 
I(f,g),  as  a  special  case  where  W(0)  =  1.  The  first  result 
generalizes  the  triangle  equality  property  for  I(f,g),  see 
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Equation  (2.21);  that  this  property  generalizes  appro¬ 
priately  is  of  interest  to  the  use  of  the  weighted 
information  measure  in  place  of  the  (unweighted)  asymptotic 
information  divergence  for  vector  quantization. 

The  Kullback  information  number  and  the  asymptotic 
information  divergence  are  well  known  to  be  convex  with 
respect  to  general  classes  of  probability  and  spectral  den¬ 
sities.  With  the  appropriate  definition  for  convex  super¬ 
position  of  AR(P)  spectra,  the  second  important  result  is 
that  the  class  of  stable  AR(P)  spectra  is  convex  and  the 
weighted  information  measure  is  strictly  convex  with  respect 

n  A 

to  this  class. 1  As  a  consequence,  Iy(Hf ,g)  can  have  at  most 
one  local  minimum  with  respect  to  this  class;  moreover,  if 
such  a  minimum  exists  it  is  also  a  global  minimum. 

Finally,  the  third  result  shows  that  the  second  mixed 

/V 

partial  derivative  of  Iv(Hf,g)  defines  a  positive  definite 
quadratic  form.  This  shows  that  any  stable  solution  to 

A 

Equation  (3*19)  is  a  local  minimum  of  Iy(Hf,g);  this  could 
also  have  been  demonstrated  using  the  strict  convexity. 
Combined  with  the  previous  result  this  shows  that  Equation 

^A  set,  is  convex  if  it  always  contains  the  convex 
superposition  of  two  elements  in  the  set.  A  convex  super¬ 
position  is  a  map  x-z  =  CS(x^,X2;Y)  defined  for  0  <  Y  ^  1  and 
all  x^,X2  «  sucn  that  x-z  =  x^  if  Y  =  1  anT  x-z  =  X2 
if  V  =  0;  if  xj  =  X2  then  x-z  ~  X2  =  x ^  for  all  Y.  A  func¬ 
tion  f(x)  defined  on  a  convex  set  3P  is  said  to  be  convex  if 
Yf(x«)  +  (1-Y)  f(x2)  _>.  f(xj)  and  strictly  convex  if  equality 
implies  x^  =  X2« 


(3*19)  can  have  at  most  one  stable  solution  (although  un¬ 
stable  solutions  can,  and  often  do,  exist);  moreover,  if 
such  a  solution  exists,  it  is  the  global  minimum  among 
stable  AR(P)  spectral  models. 

The  question  of  existence  is  not  addressed  in  this  set 
of  results.  The  existence  of  a  stable  solution  to  Enuationr. 
(3*19)  is  assumed  but  remains  an  open  question  in  general; 
existence  can  be  demonstrated  in  special  cases,  e.g.  W(0)  = 
1  ,  while  experimental  results  are  discussed  in  Chapter  V. 
Because  the  proofs  are  nonconstructive,  they  do  not  assist 
with  the  question  of  existence  nor  do  they  provide  algo¬ 
rithms  for  computation  of  a  solution;  computational  pro¬ 
cedures  are  discussed  in  Chapter  IV.  It  is  worth  noting 
that  if  no  solution  to  Equations  (3*19)  exists  then,  since 

A 

Iy(Hf,g)  must  possess  a  minimum  in  the  closure  of  the  set  of 
stable  AR(P)  spectra,  the  minimum  occurs  as  a  limit  point  of 
the  set. 

To  simplify  the  following  discussion  the  set  of  stable 
AR(P)  spectra  shall  be  denoted  0Pp.  Each  element  of  the  set 
may  be  characterized  by  a  P+1 -tuple  of  real  parameter  values 
satisfying  appropriate  (stability)  criteria.  Pour  charac¬ 
terizations  of  are  presented  below: 

Predictor  Coefficients.  Let  Ap(z)  be  given  by  Equation 
(2.4)  with  all  roots  of  Ap(z)  inside  the  unit  circle.  Then 
( a,aj  ,ap,  •  •  • , ap)  denotes  an  arbitrary  element  of  #p  if 
or  >  0. 
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Reflection  Coefficients. 


Let  Ap(z)  be  given  by 
K»juati on  (2.11)  with  |kjj|  ^  ^  ^or  n=1,2,...,P.  Then 
( «r,k^  ,k2»  •  •  •  ,kp)  denotes  an  arbitrary  element  of  #p  if 
cr  >  0. 


Autocorrelation  Coefficients.  Let  the  real  symmetric 
Toeplitz  quadratic  form  given  by 

P 

T(x)  =  ^  r|m-n|  xm  xn  (3-21) 

m,n®0 

be  positive  definite.  Then  ( r0 ,  r^ , . . . , rp)  denotes  an  arbi¬ 
trary  element  of  £Pp. 

Inverse  Correlation  Coefficients.  Let 
P 

t/g(e)  =  £ulnle1"9  (5-22) 

n=-P 

be  a  positive  function  of  6  in  [-ir,  v) .  Then  (u0,uj , . . .  ,up) 
denotes  an  arbitrary  element  of  £Pp. 

These  represent  only  a  few  of  the  infinitely  many  ways 
of  characterizing  #p.  The  first  three  par ametrizat ions  are 
well  known  with  the  corresponding  terminology  well  estab¬ 
lished  in  the  literature.  Each  set  of  predictor  coeffi¬ 
cients  is  related  to  a  unique  set  of  reflection  coefficients 
by  a  continuous  bisection  defined  by  the  Levinson-Durbin 
recursion.  Each  set  of  autocorrelation  coefficients  defines 
a  unique  set  of  predictor  coefficients  according  to  the 
Yule-Walker  equations  while  the  autocorrelation  coefficients 


may  be  retrieved  from  the  predictor  coefficients  using  Fqua- 
tions  (2.3)  and  (2.8). 

The  last  parametrization  is  less  common  than  the  other 
three;  these  parameters  have  been  denoted  "inverse  cor¬ 
relation  coefficients"  since  they  are  the  autocorrelation 
coefficients  of  a  moving-average  process  whose  spectral 
density  function  is  inverse  to  that  of  the  defined  AR(P) 
spectrum.  Each  set  of  predictor  coefficients  uniquely  de¬ 
fines  the  inverse  correlation  coefficients  according  to 

P-n 

un  =  ^  am  am+n/0-2  »  ao  =  1  i  n=0,1,...,P  (3.23) 
m=0 

That  the  predictor  coefficients  may  be  retrieved  in  a  unique 
fashion  from  the  inverse  correlation  coefficients  is  more 
difficult  to  establish.  Positivity  of  Equation  (3-22)  gen¬ 
erally  establishes  only  the  possibility  of  several  appro¬ 
priate  predictor  coefficient  sequences;  closer  inspection 
reveals  that  only  one  of  these  sequences  satisfies  the  sta¬ 
bility  requirements.  The  question  is  taken  up  in  somewhat 
greater  detail  by  Blackman  and  Tukey  [5,  pp.  126-7]. 

The  first  result  follows  easily  using  the  inverse  cor¬ 
relation  coefficient  parametrization  of  the  AR(P)  spectral 
density,  Equation  (3*22),  together  with  Equations  (3*19)  and 
(3-18). 
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Theorem  3-1  «  (Triangle  Equality).  Let  (0)  be  an 
AR(P)  spectral  density  satisfying  Equation  (3-1 9 )  and  let 
g2(0)  be  any  other  AR(P)  spectral  density.  Then 

Iw(Hf,g2)  =  Iw(Hf,g1)  +  IW(gl.62>  (3-24) 

The  inverse  correlation  coefficient  parametrization  of 
AR(P)  models  in  #p  is  used  here  to  define  the  convex  super¬ 
position  of  two  models  according  to 

u3  a  CS  (l^  ,u2;Y)  ~  Yu^  +  (1-Y)  u2  (3*25) 

for  0  _<_  \  _<  1  .  Since  (3*22)  remains  a  strictly  positive 
function  for  u3  when  Uj  and  u2  define  strictly  positive 
functions,  this  shows  91 p  to  be  a  convex  set  and  leads  to  the 
second  result. 

Lemma  3-1  ■  (Strict  Convexity).  Let  g3(e)  be  a  stable 
AR(P)  spectrum  defined  by  the  convex  superposition  of  the 
two  stable  AR(P)  spectra  g1 (0)  and  g2(0).  Then 

Iw(f*g3>  A  Ylw(f,g,)  +  (1-Y)  Iw(f,g2)  (3-26) 

for  0  V  _<  1  with  equality  only  if  g^(0)  =  g2(0). 

Proof.  Using  the  inverse  correlation  coefficient  pa¬ 
rametrization  and  the  definition  of  convex  superposition  for 
AR(P)  spectra  it  is  easy  to  show  that 


15  2 


S3 ( 0  ^  =  1 / i L v/si ( 0 ) ]  +  L ( i -v)/g2(e) ] } 


(3*27) 


Together  with  Equations  (3*18)  this  yields 
Yiy(f,gi)  +  (1-Y)  Iv/^*82)  “  I\f(f>g'3) 

/it 

W(0)  ln{[gl(0)JV  [g2(e)]1'V/g3(0)i  d0/2ir  (3-28) 

-rf 

Prom  the  theorem  on  geometric  and  harmonic  means  the  argu¬ 
ment  of  the  logarithm  in  Equation  (3.28)  is  not  less  than 
one  and  equals  one  only  if  g^(0)  =  g2(®)*  The  lemma  follows 
easily. 

Theorem  3-2.  (Uniqueness).  I\j(f,g)  can  have  at  most 
one  local  minimum  in  #p;  if  such  a  minimum  exists  it  is  also 
a  global  minimum. 

Proof .  Let  g^(6)  and  g2(0)  be  two  distinct  local  mini¬ 
ma  and  form  their  convex  superposition  g^(0).  Without  loss 
of  generality  assume  Iw(f  ,g1  )  >_  Iw(f  ,g2)  •  With  Y  i  1  the 
previous  lemma  gives 

IvU.«3)  <  Vlw(f,g1)  +  ( 1  — Y )  Iw(f,g2)  <  Iw(f»8l)  (3-29) 
But  gj(B)  is  arbitrarily  close®  to  gj(0)  for  Y  arbitrarily 


®The  Euclidean  metric  applied  to  the  inverse  corre¬ 
lation  coefficients  shall  suffice  to  define  closeness  here. 


close  to  one,  so  that  this  inequality  contradicts  the 
assumption  that  g^(@)  is  a  local  minimum.  The  second  part 
of  the  theorem  follows  by  assuming  g^(0)  is  a  local  minimum 
while  g2(9)  is  any  distinct  element  ofdfp  such  that 
I^(f,g2)  _<_  Iy(f,gi)  and  then  repeating  the  above  argument. 

In  order  to  establish  the  final  theorem  of  this  section 
the  second  mixed  partial  derivative  of  Iy(Hf,g)  is  shown  to 
define  a  positive  definite  quadratic  form.  The  variables 


uo 

for 

n=0 

2un 

for 

n/0 

(3-50) 


are  defined  for  n=0,1,...,P  so  that  the  first  partial  deriv¬ 
atives  are 


9 lw(Hf ,g)/  a  v n 


w(e))H(e)  f(e)  -  g(e)}  cos(ne)  de/2 


TT 


and  the  second  mixed  partial  derivatives  are 


(3.31 ) 


Jnm  _  9^n/  9vm 

=  ^/*W( 0) [g( e)  ]2  cos(ne)  cos(m0)  d0/2ir 


—  TT 


(3.32) 


Clearly,  the  quadratic  form 


p 

xn  xm  ^nm 
m,n=0 

is  positive  definite.  This  proves  the  following 

Theorem  3-3.  (Absence  of  False  Solutions).  Any  stable 
AR(P)  solution  to  Equations  (3-19)  is  a  local  minimum. 

Note  that  this  does  not  eliminate  the  possibility  of 
unstable  solutions  to  Equations  (3-19).  nor  does  it  estab¬ 
lish  the  existence  of  a  stable  solution.  Since  the  previous 
theorem  has  established  the  uniqueness  of  a  minimum  this 
theorem  establishes  the 

Corrollary  3-1 •  Equations  (3*19)  can  have  at  most  one 
stable  AR(P)  solution.  If  such  a  solution  exists  it  is  the 
unique  absolute  minimum  of  Iy(Hf ,g)  over  £?p. 

Remarks 

Three  general  formulations  for  estimating  the  parame¬ 
ters  of  an  AR(P)  process  in  noise  have  been  discussed.  The 
first  "ideal"  formulation  has  theoretical  foundations 
resting  upon  principles  of  information  theory  as  well  as  the 
maximum  likelihood  method.  The  second  two  formulations  are 
developed  as  approximations  to  the  first. 

The  need  for  approximate  formulations  arises  due  to  the 
difficulty  posed  by  the  nonlinear  equations  resulting  from 
the  ideal  formulation.  The  first  approximate  formulation 


rw 

■  t 

I  w(e )[g(e) ]2 

£xn  cos(ne) 

L 

_n  =  0 

d0/2w 

(3-33) 
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leads  to  the  Yule-Walker  equations  but  with  modified  corre¬ 
lation  values;  algorithms  for  solving  the  Yule-Walker  equa¬ 
tions  are  computationally  simple  and  well  understood  while 
methods  for  evaluating  the  modified  correlation  values  have 
been  carefully  studied  in  recent  years. 

While  this  first,  noise  filtering,  approach  has  led  to 
demonstrable  performance  improvements  in  noise  environments 
over  the  standard  noise  free  formulation  (and  reduces  to  the 
noise  free  formulation  in  noise  free  environments),  still 
better  performance  is  desired.  Rather  than  attempt  direct 
solution  of  the  ideal  formulation  the  second  approximate 
formulation  is  developed.  Evidence  that  this  weighted  in¬ 
formation  formulation  leads  to  improved  performance  over  the 
noise  filtering  formulation  is  presented  in  Chapter  V; 
neither  approximate  formulation  is  expected  to  perform  as 
well  as  the  "ideal"  formulation. 

The  weighted  information  formulation  is  related  to 
other  techniques  that  have  appeared  in  the  literature. 
Consider  the  situation  wherein  the  desired  signal  spectrum 
is  essentially  zero  outside  the  region  0«[-w/$,  w/3>)  while 
the  noise  spectrum  is  essentially  zero  inside  this  region. 
The  foregoing  theory  indicates  that  an  appropriate  selection 
for  the  weight  function  is 

|  1  9«[-w/®,  "/$) 

w( 0 )  =  H(e)  = 


o 


otherwise 


(3-34) 


ao  that  the  weighted  information  is 


/V® 

{[f(e)/g(e)]  -  in[f(e)/g(e)j  -  1}  de/2ir  (3.35) 
-n/s 

With  the  change  of  variable  0/$  =0  this  may  be  rewritten 


Iw(Hf,g) 


(1/S) 


{[f(e/s)/g(e/$) ]  - 
ln[f(e/D)/g(e/S)]  -  1}  de/2 


IT 


(3.36) 


Clearly  the  indication  here  is  to  low  pass  filter  and  deci¬ 
mate  the  observed  signal  before  fitting  the  AR(P)  model  to 
the  resulting  data.  This  is  precisely  the  technique  em¬ 
ployed  by  Quirk  and  Liu  [45]  to  improve  the  resolution  of 
AR(P)  estimation  in  noise;  they  considered  the  use  of  AR(P) 
estimators  to  determine  the  frequencies  of  sinusoids  in 
noise  and  demonstrated  that  the  filtering/decimation  scheme 
is  clearly  advantageous  when  the  sinusoids  are  a  priori 
known  to  lie  in  some  fixed  frequency  range. 

The  problem  which  motivates  the  present  work  concerns 
signal  and  noise  spectra  that  are  both  generally  nonzero 
throughout  the  entire  frequency  range,  [-*,'IT);  hence  the 
luxury  of  simple  filtering/decimation  schemes  is  not  permit¬ 
ted.  On  the  other  hand,  the  difficulties  associated  with 
very  limited  quantities  of  data  are  not  the  primary  focus  of 
this  work  so  that  the  asymptotic  formulation  is  considered 
adequate . 
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Computational  issues  for  the  weighted  information  for¬ 
mulation  are  discussed  in  Chapter  IV.  Equations  (3.1 9)  are 
cast  in  algebraic  form  and  their  (exact)  analytical  solution 
is  discussed.  Approximate  (numerical)  solution  methods 
might  be  developed  based  upon  the  resulting  analytical 
system  of  equations  or  directly  upon  minimization  of 

A 

Iy(Hf,g);  the  latter  approach  is  adopted  to  develop  a  simple 
iterative  procedure  based  upon  the  notion  of  a  contraction 
mapping.  In  addition,  computational  procedures  appropriate 
to  the  use  of  the  weighted  information  for  vector  quantiza¬ 
tion  are  discussed.  Since  in  many  applications  the  "vector 
quantization  codebook"  may  be  designed  "off-line"  using 
noise  free  speech  data,  questions  associated  with  the  code- 
book  design  problem  are  not  discussed;  instead,  computa¬ 
tional  procedures  for  the  "on-line"  minimization  of  I^(Hf,g) 
over  the  finite  codebook  are  developed. 


CHAPTER  IV 

COMPUTATIONAL  FORMULATION 

In  this  chapter  computational  procedures  for  the 
solution  of  Equations  (3*19)  are  discussed.  In  the  first 
section  the  system  is  reduced  to  an  algebraic  form  by  as¬ 
suming  the  weight  function  to  take  the  form  of  an  AR(M) 
power  spectral  density;  once  cast  as  a  nonlinear  algebraic 
system  of  equations,  analytic  procedures  for  solving  the 
system  are  discussed.  In  the  second  section,  techniques  for 
evaluating  the  coefficients  of  the  system  are  discussed. 

Analytic  solution  of  the  nonlinear  algebraic  system 
becomes  increasingly  difficult  as  the  order  of  the  weight 
function,  M,  is  increased.  While  numerical  polynomial  root 
solving  procedures  could  be  systematically  applied,  the 
third  section  develops  instead  an  iterative  procedure  based 
upon  the  idea  of  a  contraction  mapping.  Together  with 
sampled  frequency  domain  processing  techniques,  these  iter¬ 
ative  procedures  do  not  restrict  the  weight  function  to  an 
all-pole  form.  The  fourth  section  develops  computational 
formulae  required  for  the  use  of  the  weighted  information  in 
vector  quantization;  an  extension  of  Jensen's  theorem  is 
developed  to  permit  closed  form  evaluation  of  the  ap¬ 
propriate  integrals  when  the  weight  function  assumes  an 


-~V4 


AR(M)  form.  Finally,  the  last  section  concludes  this  chap¬ 
ter  with  some  final  remarks  concerning  these  computational 
methods. 


Reduction  to  Algebraic  Form 


w(e)  fi(e)  f(e)  eine  de/2i 


n  =  0, 1 , ...» P 


(4.1) 


denote  the  coefficients  appearing  on  the  left  hand  side  of 
Equations  ( 3 - 1 9 ) •  Let 


/:• 


(e)  g(e)  ein0  de/2ir 


n  =  0,1, ...,P  +M 


(4.2) 


denote  the  quantities  appearing  on  the  right  hand  side  of 
Equations  (5*19)*  Observe  that  the  index  of  Pn  is  permitted 
to  range  beyond  P  to  P+M.  If  W(0)  is  an  AR(M)  spectrum 
given  by 


W(e)  =  *§/[BM(ei0)  BM(e"i0)] 


(4.3a) 
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where 


M 

BjjCz)  =  Vbm  z"m;  b0  =  1  (4.3b) 

m=0 

and  if  g(0)  is  an  AR(P)  spectrum  given  by  Equations  (2.3) 
and  (2.4)  then  their  product  is  an  AR(P+M)  spectrum  given  by 

W(0)  g(0)  =  <r^/[Cp+flj(  e'*'6  )  Cp+m(e  )  ]  (4-4a) 


where 


P+M 

Cp+M(z)  =  Ap(z)  B„(z)  =  cmz“m;  cQ  =  1  (4. 4b) 

m=0 

The  quantities  defined  by  Equation  (4-2)  are  related  to 
the  polynomial  coefficients  in  Equation  (4.4b)  by  the  Yule- 
Walker  equations 


pP+M  PP+M-1  •  *  * 


PP+M 

1 

1 

PP+M-1 

0 

0 

= 

0 

0 

• 

• 

P° 

• 

• 

_G  P+M_ 

• 

• 

0 

2  2 

aWa 


(4.5) 


Equations  (3*19)  assign  numerical  values  to  some  of  the 
entries  in  the  coefficient  matrix  according  to 


Pn  =  pn  •  n=0, 1 , . . .  P 


(4 
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while  the  remaining  entries  are  to  be  considered  as  un¬ 


knowns.  The  elements  of  the  column  vector  are  defined  as  a 


linear  combination  of  the  coefficients  of  the  unknown  poly¬ 


nomial,  Ap(z),  by  Equation (4.4b)  which  may  be  rewritten  in 


matrix  form  as 


(4.7) 


Equations  (4*5),  (4*6),  and  (4-7)  define  a  nonlinear 


system  of  P+M+1  multivariate  polynomials  in  the  P+M+1  un- 


A  ^  A 

knowns  cr,  •  •  •  *  ap»  ^p+1  »  ^P+2'  •  •  • »  ^P+M' 


Each 


polynomial  is  a  first  order  function  of  each  unknown  while 


each  term  in  these  polynomials  may  involve  up  to  two  dis¬ 


tinct  unknowns.  The  properties  of  the  weighted  information 


developed  in  Chapter  III  indicate  that  this  system  of 


equations  can  have  at  most  one  stable  solution;  if  a  stable 


solution  exists  it  is  the  solution  sought. 

Assuming  the  AR(M)  weight  function  to  be  stable  the 


product  polynomial,  Cp+M(z),  also  has  all  its  roots  inside 


the  unit  circle  and  may  be  expressed  recursively  in  terms  of 


a  set  of  reflection  coefficients  according  to 


cn(z)  =  Cn_i  ( z)  +  kn  z~n  C^U-1);  C0(z)  =  1 


(4.8) 


%  .%  V  \  *.  V 
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for  n  =  1,2,...,P+M.  If  the  coefficient  matrix  in  Equation 
(4-5)  were  entirely  known  then  the  Levinson-Durbin  re¬ 
cursion^  could  be  applied  to  yield  Cp+jyj(z).  Since  some  of 
the  entries  in  the  coefficient  matrix  are  unknown,  the 
Levinson-Durbin  recursion  cannot  proceed  beyond  the  determi¬ 
nation  of  Cp(z);  the  remaining  reflection  coefficients 
jkp+j  ,  kp+2****»  kp+jy|}  are  unspecified  (beyond  the  stability 
requirement  that  |kn|  <  1)  by  Equations  (4*5)  and  may  be 

*  Ai  ^ 

considered  as  new  unknowns  replacing  {  Pp+i  »  pp+2»‘***  Pp+M>* 

These  remaining  reflection  coefficients  should  be  se¬ 
lected  so  that  Cp+jn(z)  =  0  modulo  Bjj(z).  Once  these  have 
been  determined  the  solution  may  be  obtained  by  simple  poly¬ 
nomial  division  from 

Ap(z)  =  Cp+fl(z)/Bji[(z)  (4-9) 

together  with 

p+M 

°-2  =  (  Po/°tf)  ]7(1-kn>  (4.10) 

n=l 

To  determine  the  remaining  reflection  coefficients  it 
is  generally  simpler  to  consider  the  polynomials 


This  well-known  algorithm  may  be  found  in  many  fairly 
recent  publications;  for  example,  see  [21,  p.  55ff]«  An 
exposition  by  the  authors  is  contained  in  [57 J  and  [58]. 
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Cp+M(z)  =  z-(p+M)  Cp+M(z-1  ) 

Bm(z)  =  z"M  Bm(z-1) 

30  that  the  condition  to  be  satisfied  is 


(4.11) 

(4.12) 


Cp+M(z)  =  0  modulo  B^z)  (4.13) 

Modulo  reduction  is  then  accomplished  more  simply  by  re¬ 
peated  use  of  the  substitution 

M-l 

z'M  -  -  X  *>M-/  (4-H) 

1  =  0 

in  Cp+M(z)  until  all  powers  of  z”1  larger  than  M-1  have  been 
eliminated.  The  reduction  process  is  facilitated  by  using 
the  recursion  (4.8)  to  express  Cp+M(z)  as 

Cp+M(z)  =  Cp(z)  Em(z)  +  z"P  CpCz-1)  FM(z)  (4-15) 


where 

En(z)=z-1  En-1(z)  +  kp+n  z"(n_1^  pn-1  (z""1  ) »  E0(z)  =  1  (4-  16a) 
Fn(z)=z_1  Fn_.,(z)  +  kp+n  z“^n-1  ^  E^^z”1);  FQ ( z ) =0  (4.16b) 
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z)  =  z"P  C 


With  these  formulae  the  reduction  is  accomplished  in  part  by 


determining 


=  Cp(z)  mod  Bm(z) 


(4.18a) 


=  Cp(z)  mod  BjjCz)  =  z“P  CpCz-1  )  mod  Bj,j(z)  (4.18b) 


The  condition  to  be  satisfied  is  then 


(z)  Ejj(z)  +  Djh_^  (z)  Fjj(z)  =0  mod  B^(z) 


(4.19) 


Modulo  reduction  of  the  left-hand  side  of  Equation 


(4-19)  leads  to  an  M-1  3t  order  polynomial  whose  M  coef¬ 


ficients  must  be  equated  to  zero;  this  yields  a  system  of  M 


nonlinear  polynomial  equations  in  the  M  unknowns  {kp+1, 
kp+2»  •*.»  kp+jn|j .  While  these  equations  are  nonlinear  some 


reflection  will  reveal  that  each  polynomial  equation  is 


linear  (i.e.,  of  first  degree)  in  each  of  the  unknowns;  the 


nonlinearity  enters  by  way  of  terms  involving  products  of 


different  unknowns. 


Because  of  this  structure,  systematic  algebraic  elimi¬ 
nation^  will  yield  an  order  polynomial  in  a  single  un¬ 
known;  each  acceptable  root  of  this  polynomial  will  yield  an 
M-1 order  polynomial  in  a  second  unknown.  Continuing  in 
this  fashion  one  successively  solves  M-1st,  ...  order 
polynomial  equations  possibly  generating  M  factorial  po¬ 
tential  solutions  of  which  at  most  one  satisfies  the  sta¬ 
bility  criteria.  This  method  is  feasible  for  small  values 
of  M  (e.g.  M  _<_  4)  but  for  larger  values  of  M  one  must  gener¬ 
ally  resort  to  numerical  polynomial  root  solving  pro¬ 
cedures  .  ^ 

For  the  case  M=2,  let 


D1  (z) 


dQ  +  d1 


(4.20a) 


Dl(z)  =  do  +  d1  z 


(4.20b) 


p 

“^Several  methods  (such  as  those  due  to  Euler,  Bezout, 
or  Sylvester)  are  available;  one  should  take  care  not  to 
introduce  extraneous  roots.  For  a  general  discussion  see 
[59,  Vol .  II,  p.  70ff]  or  [60,  p.  277ff]. 

■^The  recommendation  that  M  not  exceed  four  is  made 
based  upon  the  fact  that  general  polynomial  equations  of 
degree  five  and  higher  cannot  be  solved  algebraically  [59, 
Vol.  II,  p.  286].  Of  course  this  does  not  eliminate  the 
possibility  of  transcendental  solutions  [59,  Vol.  I,  p.  274] 
or  the  possibility  that  some  special  structure,  may  be 
discovered  (or  imposed)  to  aid  in  the  solution. 
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and  let 


3 

G^(z)  =  (z)  E2(z)  +  D1  (z)  P2(z)  =  ^  gjj  z~ra  (4-21  ) 

m=0 

denote  the  left  hand  side  of  Equation  (4-19).  Using 
Equations  (4. 16)  these  coefficients  are 


«o  =  do  kP+2  (4.22a) 
Si  =  do  kP+1  kP+2  +  do  kP+1  +  d1  kP+2  (4.22b) 
g2  =  d0  +  ^1  kP+1  kP+2  +  d1  kP+1  (4.22c) 
g3  =  d,  ( 4 • 22d ) 


while  modulo  reduction  yields 

50  -  b2  S2  +  bi  b2  S3  =  0  (4.25a) 

51  -  b1  82  +  (bf~b2)  S3  =  0  (4.25b) 

Expanding  Equations  (4*25)  yields 

P0  kp+2  +  Pi  *  kp+l(<lo  kp+2  +  <M)  (4.24a) 

Po  kP+2  +  Pi  =  fcp+iUo  kP+2  +  §1  )  (4.24b) 
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where 


Po  =  do 


(4.25a) 


p1  -  d1  b1  b2  -  d0  b2 


(4.25b) 


(4.25c) 


=  ?,(bf-b2)  -  a0  b, 


(4-25d) 


d1  b2 


(4-25e) 


d1  b2 


(4.25f) 


=  d1  b,  - 


(4.25g) 


d1  b1 


(4.25h) 


So  that  the  solutions  are  given,  upon  elimination,  by 


kp+i  =  (p0  kp+2  +  Pi  )/Cq.0  kP+2  +  'll  ) 


(4.26) 


=  [  -si  + 


s?  -  4so  s2  ]  /2s; 


(4.27) 
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where 


30  =  Pi  Si  -  Pi  <li  (4.28a) 

Si  =  Pi  So  -  Pi  q0  +  P0  Si  -  P0  qi  (4.28b) 

s2  =  P0  S0  "  P0  <lo  (4.28c) 

Evaluation  of  Coefficients 

There  are  numerous  methods  of  coefficient  evaluation 
that  may  be  considered  consistent  with  the  foregoing  formu¬ 
lation.  Generally  this  section  will  present  a  few  of  the 
possibilities  for  evaluating  the  coefficients  defined  by 

Equation  (4-1).  In  addition,  some  discussion  will  be  de¬ 
voted  to  characterizing  the  weight  function  according  to 
Equations  (4- 3).  While  performance  of  the  estimation  pro¬ 
cedure  will  undoubtedly  depend  upon  the  specific  method  of 
coefficient  evaluation,  no  one  method  can  be  strongly  advo¬ 
cated  (i.e.  to  the  total  exclusion  of  other  methods)  at  this 
time;  in  addition,  final  selection  of  a  method  may  be  in¬ 
fluenced  by  other  ancillary  requirements  of  the  specific 
application.  Because  no  single  explicit  procedure  ? s  to  be 
recommended  here  the  discussion  stresses  concepts  rather 
than  detailed  mathematical  formulae. 

Time  domain  noise  filtering  methods  usually  determine 
(adaptively)  the  coefficients  of  a  finite  impulse  response 

A 

(FIR)  linear  filter  whose  power  spectral  response  is  H(0). 
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By  simply  cascading  two  of  these  filters  a  new  filter  is 

A  A, 

created  whose  power  spectral  response  is  H(9)  H(e).  The 

coefficients  in  Equation  (4.1)  may  then  be  computed  in  the 
usual  manner  (lag  products  of  the  windowed  data)  from  the 
output  of  the  cascaded  filter  structure.  This  scheme,  de¬ 
picted  in  Figure  2,  assumes  the  relationship  expressed  by 
Equation  (3-20)  although  this  relationship  may  generally  be 
avoided  by  replacing  one  of  the  filters  in  the  cascade  by  a 

filter  with  W(e)  as  its  power  spectral  response.  For  each 

data  window,  a  "snapshot”  of  the  impulse  response  of  the  FIR 
filter  could  be  used  to  estimate  the  parameters  of  W(e). 
Since  the  response  of  the  FIR  filter  may  differ  slightly 
from  the  response  of  the  weight  function  a  somewhat  more 
consistent  procedure  would  use  the  weight  function  pa¬ 
rameters  to  implement  an  infinite  impulse  response  (HR) 
filter  as  the  second  filter  in  the  cascade. 

Frequency  domain  noise  filtering  methods  generally 
provide  greater  flexibility  in  response  function  selection 
than  is  available  with  time  domain  methods.  These  methods 
involve  an  explicit  transformation  to  the  frequency  domain, 
often  by  using  the  discrete  Fourier  transform  (DFT),  cud 
determine  the  multiplicative  response  function,  H(e),  in 
sampled  form  using  a  formula  such  as  Equation  (-.19)  or 
(2.20).  The  sampled  form  of  H(0)  may  be  used  to  estimate 
the  parameters  of  W(e).  If  the  noise  filtered  signal  is  not 
required,  frequency  samples  of  the  weight  function  may  be 
used  multiplicatively  before  evaluating  the  coefficients: 


alternatively,  one  may  avoid  re-evaluating  the  weight  func- 

A 

tion  and  simply  apply  H(e)  twice.  This  latter  alternative 
is  depicted  in  Figure  3* 

A  mixed  time-frequency  domain  method  is  employed  to 
obtain  some  of  the  results  presented  in  Chapter  V.  In  this 
method  a  Hamming  window  is  applied  to  the  observed  data 
which  is  then  zero-extended  before  computing  the  DFT.  A 
sampled  noise  spectrum  estimate  is  used  together  with  these 
transform  values  to  compute  a  noise  filter  spectral  re¬ 
sponse,  H(8),  according  to  Equations  (2.19)  or  (2.20).^ 
This  frequency  sampled  noise  filter  response  is  applied 
multiplicatively  to  the  transform  values  and  an  inverse  DFT 
of  these  modified  transform  values  (with  their  original 
phase  values)  is  computed.  A  random  phase  characteristic  is 
computed  and  introduced  to  the  frequency  sampled  noise 
filter  spectral  response  which  is  inverse  transformed  to 
obtain  an  impulse  response  characteristic.  Standard  (auto¬ 
correlation  method)  LP  analysis  is  applied  to  this  impulse 
response  characteristic  to  determine  the  parameters  of  the 
weight  function.  These  parameters  are  used  to  implement  a 

^It  is  generally  found  to  be  useful  to  modify  the  fre¬ 
quency  response  characteristic  slightly  by  smoothing  the 
response  obtained  from  (2.19)  or  (2.20;  across  frequency. 
The  smoother  should  eliminate  features  narrower  than  those 
expected  in  the  final  signal  spectrum  while  retaining 
broader  features;  a  recursive  median  filter  with  a  total 
length  of  about  2.5#  of  the  single-sided  bandwidth  is  a 
current  favorite  of  this  author.  End  conditions  (near  the 
DC  and  Nyquist  frequencies)  can  be  properly  handled  using 
the  known  periodic  nature  of  the  frequency  response. 


(lattice  structure)  filter;  beginning  in  the  all-zero  state 
the  noise  filtered  (inverse  transformed)  data  values  are 


passed  through  this  filter  which  is  then  permitted  to  "ring" 
awhile. ^  Lag  products  computed  from  this  output  then  pro¬ 
vide  the  required  coefficient  estimates;  the  overall  pro¬ 
cedure  is  depicted  in  Fitrure  4. 

Finally,  it  is  worth  mentioning  that  each  of  these 
methods  has  recommended  computing  the  final  coefficient 
estimates  as  lagged  products.  The  reason  for  this  is  that 
various  quantization  effects  may  occur  up  to  the  point  of 
obtaining  the  modified  data  samples;  however,  if  full  pre¬ 
cision  is  maintained  in  the  final  lag  product  computations, 
the  resulting  coefficient  estimates  will  define  a  positive 
definite  symmetric  Tbeplitz  quadratic  form  in  all  but  a  very 
few  highly  exceptional  cases  (such  as  all  modified  data 
samples  being  identically  zero). 

Iterative  Techniques 

Equations  (3-19)  may  be  solved  when  the  weight  function 
has  an  AR(M)  form  by  using  the  algebraic  procedures  de¬ 
scribed  in  the  first  section  of  this  chapter;  this  method  is 
appropriate  if  M  <_  4«  Unfortunately,  it  is  expected  that 
accurate  estimation  of  speech  spectra  will  require  weight 
functions  with  greater  variation  than  is  possible  with  an 


^That  is  to  say  that  a  zero  input  is  applied  to  the 
filter  after  all  the  noise  filtered  data  values  have  been 
applied  as  input. 
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Figure  4.  Mixed  Time-Frequency  Domain  Coefficient  Evaluation. 


AR(4)  form.  The  procedures  of  the  first  section  might  be 
extended  by  applying  numerical  polynomial  root  solving  pro¬ 
cedures  when  M  becomes  large  but  at  present  such  an  approach 
appears  somewhat  cumbersome. ^  In  this  section  alternate 
numerical  formulations  are  discussed  that  do  not  make  spe¬ 
cific  (parametric)  assumptions  as  to  the  form  of  the  weight 
function;  these  techniques  are  iterative  and  based  upon  the 
notion  of  a  contraction  mapping.  A  good  general  reference 
for  this  section  is  Collatz  [61]. 

Most  (single-step)  iterative  procedures  can  be  ex¬ 
pressed  in  the  form^ 


v(n+1 )  _  ^(yCn))  (4.29) 


^For  the  reader  wishing  to  pursue  this  approach  it  is 
worth  noting  that  one  stumbling  block  is  that  the  previous 
uniqueness  theorem  has  not  eliminated  the  possibility  of  an 
unstable  (or  imaginary)  solution  to  Equations  (4*5),  (4.6), 
and  (4-8)  for  which  some  (but  not  all)  of  the  reflection 
coefficients  are  real  and  in  the  interval  (-1,  1).  If  one 
could  devise  a  method  which  guarantees  that  only  the 
solution  sought  has  real  parameters  isolated  in  (-1,  1),  or 
some  other  known  interval,  the  development  of  a  numerical 
algorithm  would  be  greatly  facilitated.  The  reader  is  re¬ 
ferred  to  [60,  p.  99ffJ  or  any  similar  discussion  of  nu¬ 
merical  methods  for  determining  real  roots  of  polynomials. 

^Parenthesized  superscripts  shall  denote  instances  of 
the  parameter  vector  while  subscripts  shall  denote  com¬ 
ponents  of  the  parameter  vector. 


where  v^n^  is  the  n*'*1  iterate  of  the  parameter  vector  v. 
The  solution  sought  is  a  fixed  point  of  the  map  If 

Q 

satisfies  a  Lipschitz  condition 

|j£(v(1))-£(v(2))||  _<  ii?||v^1^-v^2^||  (4-30) 

for  some  0  <  <£  <  1  then  <p  is  said  to  be  a  contraction  map. 
Contraction  maps  are  often  used  to  prove  existence  theorems 
because  the  sequence  of  iterates  generated  by  (4.29)  is 
Cauchy. 

The  problem  of  designing  an  iterative  procedure  for 
solving  a  system  of  equations  can  be  viewed  as  the  problem 
of  finding  a  contraction  map  whose  fixed  points  coincide 
with  the  solutions  sought.  One  usually  begins  with  a  map 
having  the  appropriate  fixed  points  and  then  tries  to  show 
it  satisfies  a  Lipschitz  condition;  often  one  employs  the 
mean  value  theorem  which  states  that  if  <Pn  is  a  continuously 
differentiable  function  of  the  parameter  vector  v  then^ 


O  _ 

The  map  <p  is  assumed  to  have  its  domain  in  a  Banach 
space  with  norm  ||-||  and  its  range  contained  by  the  domain. 

^Two  notational  conventions  are  introduced  here.  First 
<Pn/t  denotes  9<Pn/dvt  and  second  the  Einstein  summation  con¬ 
vention  (with  respect  to  repeated  subscripts)  is  employed. 
The  summation  range  is  0,1,..., P  so  that  the  Einstein  con¬ 
vention  implies  summation  with  respect  to  the 
subscript  <  (only)  over  this  range  on  the  right  hand  side  of 
(4.31).  These  conventions  are  used  in  this  section  only. 
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*n(v(1  })  -  <Pn(v^2^)  =  iPnjt  (vvO)  +  [1-YJ  v^2b  ivf(1)  -  v/2^| 

(4.31 ) 


for  some  0  <  \  <_  1.  If  one  can  determine  a  constant  SB  <  1 
majorizing  the  norm  of  the  matrix  with  components  <pn/f  then 
v  has  been  demonstrated  to  satisfy  a  Lipschitz  condition. 

Using  Equations  (3-22),  (3*30),  (3*31 )»  (3*32)  and 

(4.1)  the  system  of  Equations  (3*1 9 )  may  be  expressed  as 

Vn  =  0  ;  n  =  0,1,. ..,P  (4.32) 

where 


Vn  =  pn  Lnm  vm 


(4.35) 


Defining 


Lnmi 


w(e)[g(e)]5  cos(ne) 


cos(me) 


cos(l9)  de/2-n 


(4.34) 


and 


6 


nm  ~ 


0 


1 


n  /  m 


n  =  m 


(4.35) 
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the  following  relations  may  be  easily  verified 


Lnm/l  =  “2  Lnmf 
^n/i  =  ~  ^nm/i  vm  ”  ^nm  vm/i 
=  2  ^nmi  vm  ”  ^nm  fimi 

=  2  Lni  "  Lm  =  Lni 
Consider  the  map  <p  with  components^® 

vn.  =  vn  *  t-^o  Jnm  ^m 


(4.36) 


(4.37) 


(4.38) 


where  X  is  a  nonzero  scalar  constant.  Use  of  this  map  for 
an  iterative  procedure  is  essentially  a  modified  Newton 
method.  First  observe  that  v  has  a  fixed  point  if  and  only 
if  the  second  term  on  the  right  hand  side  of  (4.38) 
vanishes.  This  term  vanishes  if  and  only  if  Equations 
(4.32)  are  satisfied  since,  as  shown  in  Chapter  III,  L  (and 
so  also  L”1  and  L”^ )  is  positive  definite. 


®If  L  denotes  the  matrix  with  entries  L-m  and  L”! 
the  inverse  of  this  matrix  then/nI»Q  shall  denote  Ii 
evaluated  at  the  initial  iterate  and  [L^1  ]nm  its 

entries. 


Next,  using  (4.37),  consider 


vn /t  ~  vn h  ~  ^  -lnm  ^m/l 
=  ^n|  ~  ^  t^o^nm 

which,  if  evaluated  at  v  =  v^\  is 

-  o-a)  tnt 


(4-39) 


(4. 40) 


Clearly,  (4.40)  is  majorized  by  2  -  1 1  —  X |  so  that 
X  should  be  selected  in  the  range  0  <  X<  2  if  the  Lipschitz 
condition  is  to  be  satisfied.  More  generally,  since  the 
last  term  in  (4-39)  is  positive  definite,  X  should  be  se¬ 
lected  in  the  range  0  <  X  <  2/Xmax  where 

‘.M>  Wt  If 

bounds  the  matrix  norm.  With  this  selection 


inf  n 

N=i  q" 


rn/t  1/ 


1  - 


X  fe|-l  ^ 


1  >  -1 


(4.42) 


and  the  matrix  norm  of  <pnft  is  bounded  by  one. 

Apparently  the  choice  X=  lAmax  would  lead  to  the  most 
rapid  convergence  while  smaller  values  would  lead  to  slower 
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convergence  and  guarantee  that  <t>n/t  ia  positive  definite. 
Unfortunately,  the  right  hand  aide  of  inequality  (4-41)  is  a 
function  of  the  parameter  vector  v  and  cannot  be  bounded  by 
a  constant,  Xmax,  for  all  v  in  £Pp;  consequently  the  Lip- 
schitz  condition  cannot  be  satisfied  everywhere  in  5?p. 

If  a  solution,  g*(0),  exists  in  £Pp  it  is  possible  to 
find  a  constant  Gmax  sufficiently  large  such  that 

g*(0)  _<_  Gmax  (4-43) 


for  all  0<[-ir,ir).  For  such  a  constant  the  solution  will  be 
contained  in  that  portion  of  for  which 


g(0)  1  G 


max 


(4.44) 


for  all  0c[-ir,TT).  Then  from 


3Ufp  ^n^o  -^nm  ^m / 


Nl 


-  ||q||^l  *n  Lnm  am/® 


—  wmax  Gmax/® 


where 


W(0)  <  W 


max 


(4-45) 


(4.46) 


l8l 


for  all  9« j_  —  it,  ir )  and 


it  is  clear  that  any  choice 

Amax  2.  wmax  Gmax/®  (4-48) 

will  suffice  to  satisfy  the  Lipschitz  condition  for  that 
portion  of  ^?p. 

To  recapitulate,  the  map  v,  defined  by  (4*38),  has 
fixed  points  coinciding  with  the  solutions  to  (4-32).  More¬ 
over,  if  there  exists  a  solution  in  £?p  and  the  domain  of 
<p  is  suitably  restricted  to  a  subset  of  #p  containing  this 
solution  then  there  exists  X>  0  sufficiently  small  such 
that  v  satisfies  a  Lipschitz  condition  on  this  subset  and 
(4-39)  is  positive  definite.  This  implies  that  application 
of  the  map  v  to  any  element  of  the  subset  will  generate  a 
new  parameter  vector  closer  (in  norm)  to  the  solution. 
Hopefully,  repeated  application  of  ?  will  generate  a  se¬ 
quence  of  parameter  vectors  approaching  the  solution;  this 
will  be  the  case  if  each  new  parameter  vector  is  also  in  the 
restricted  domain  of  ?. 

Providing  a  guarantee  that  each  new  parameter  vector 
will  be  within  the  restricted  domain  of  ?  is  not  a  simple 
task.  Without  such  a  guarantee  it  is  possible  to  devise  a 
computational  test  to  check  for  this  condition;  then,  if  the 


'.82 


£ 


l- 


test  is  violated,  some  method  must  be  devised  to  restart  the 
iterations.  In  practice  the  situation  is  not  expected  to  be 
quite  so  pathological;  if  \  :s  selected  to  be  conservatively 
small  (.smaller  if  the  solution  is  expected  to  be  a  sharply 
peaked  spectrum)  and  a  reasonably  good  initial  estimate  is 
provided,  one  does  not  expect  to  encounter  convergence  dif¬ 
ficulties.  This  more  optimistic  approach  shall  be  taken  in 
the  following. 

To  implement  the  iterative  procedure  assume  W(e)  is 
available  in  sampled  form.  The  components  of  the  nth  iter¬ 
ate  parameter  vector  may  be  used  to  evaluate 

P 

gn(0)  =  1/1  cos (i  0 )  }  (4-49) 

1=0 

in  sampled  form.  If  the  sample  mesh  is  equally  spaced  at 


0k  =  nk/N  ;  k  =  -N, ...» 0, 1 , ... , N-1 


(4.50) 


then  the  components  may  be  computed  from 

N-1 


y(n)  -  p  _ 

m  ~  un 


w(ek)  gn(ek)  cos(mek)/2N 


(4.51  ) 


k=-N 


and  the  components  of  the  next  iterate  are  provided  by 


v(n+1)  _  v(n)  _  xrL-1 1  y^n' 
vl  v<  A  L  "o  J  t  m  m 


(4.52) 
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A  crude  test  that  the  n  n  iterate  is  in  is  provided  in 
the  course  of  these  computations  by  verifying  that  the  de¬ 


nominator  of  (4.49)  is  positive  on  the  sample  mesh. 

The  procedure  can  be  initialized  by  the  solution  to  the 
Yule-Walker  equations  where  the  elements  of  the  coefficient 
matrix  are  given  by  Equations  (3*23)  and  (3*30)  may 

then  be  used  to  evaluate  while  the  elements  [L^1 ]nm  may 

be  obtained  by  inverting  the  real  symmetric  matrix  with 
entries 

N-l 

^LoJnm  =  cosUe^)  cos(m9k)/2N  (4-53) 

k=-N 

The  coefficients  pm  may  be  evaluated  from 
N-l 

ffc  =  ^2  W(0k>  **(«*)  f(8k)  cosCme^) /2N  (4-54) 

k=-N 

Alternatively,  the  computational  methods  described  in  the 
previous  section  may  be  employed  to  evaluate  the  Pm  as 
lagged  products  of  modified  data  values. 

A  simple  test  for  iteration  completion  is  to  simply 
check  that 

*  £  tv4n)  J2  (4-55) 

m=0 

is  less  than  some  small  preselected  value.  Finally,  to 
obtain  filter  coefficients  as  are  required  by  many 
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applications,  it  is  perhaps  simplest  tc  first  compute 
correlation  values  from 


N-l 

rffl  =  ^  gn ( )  cos( mOjj)  (4.56) 

k  =  -N 

and  then  solve  the  Yule-Walker  equations. 

If  at  some  step  prior  to  iteration  completion  an  iter¬ 
ate  falls  outside  0?p,  one  may  attempt  to  reinitialize  the 
procedure  using  one  of  the  last  few  iterates  inside  5?p. 

Formulae  for  Vector  Quantization 

In  this  section  formulae  relevant  to  the  problem  of 

A 

minimizing  Iw(Hf,g)  over  a  specified  finite  collection  of 
AR(P)  model  spectra  are  developed.  Consider  first  that 
according  to  Equation  (3.24)  this  problem  is  equivalent  to 
minimizing  Iy(gj,g)  where  g^(9)  is  an  AR(P)  model  spectrum 
satisfying  Equation  (3-19).  Next,  observe  that  minimizing 
Iw(g^ »s)  is  equivalent  to  minimizing 


w(«i»g)  =  f [w(0)  Si (0)/g(8)  +  w(e)  In  g(e)3  de/2w  (4.57) 

— — TT 


Since  g(0)  is  an  AR(P)  model  given  by  Equation  (3.22)  the 
first  term  in  Equation  (4*57)  may  be  rewritten  as 


f"  P 

/  w(e)  g1(e)/g(e)  de/2w  =  yP  u 


ini  inr 


(4.58a) 


where  the  fact  that  g^O)  satisfies  Equation  (3*19)  has  been 
used  together  with  Equation  (4.1).  Similarly,  the  second 
term  in  Equation  (4*57)  may  be  rewritten  as 


J  W(e)  In  g(0)  de/2ir  =  ln(*2)  J  V (0)  d0/2ir 


¥(0)  ln[  Ap(  eie  )  Ap(e"ie)J  de/2ir  (4.58b) 


In  general  Jy(g^  ,g)  will  be  minimized  over  the  finite 
collection  of  AR(P)  spectra  by  evaluating  this  quantity  for 
each  model  spectrum  in  the  collection.  For  any  given  model 
spectrum  the  first  term  may  be  easily  evaluated  using 
(4. 58a);  the  coefficients  may  be  determined  from  the  data 
using  one  of  the  methods  outlined  in  the  second  section  of 
this  chapter.  The  second  term  presents  somewhat  greater 
difficulty;  when  W(e)  =  1  the  last  term  in  (4- 58b)  may  be 
shown  to  vanish  as  a  consequence  of  Jensen's  theorem  but,  in 
general,  this  term  will  not  vanish. 

When  W(e)  has  an  AR(M)  form  an  extension  of  Jensen's 
theorem,  which  shall  be  developed  presently,  permits  the 
evaluation  of  this  term  from  a  simple  formula.  In  order  to 
establish  the  general  theorem  it  shall  be  necessary  to  first 
establish  the  following  lemma. 


Lemma  4-1 »  Let 

P 

ApU-1)  =  JJ  (1  -  T?m  z)  (4.59) 

m=l 
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have  no  roots  inside  the  unit  circle,  r  .  If  ana  are 


also  within  the  unit  circle  then 


tin  Ap(  z-1  )  j  /  |  ( z-Tk )  ( 1  -  Ttz)\  dz/2wi 


=  {In  ApiT^1  )  |/(1-Tk  Tt  ) 


(4.60) 


Proof.  The  method  of  proof  is  essentially  the  same  as 
that  used  for  Jensen's  theorem  by  Hille  [62,  pp.  256-7 J. 
Assume  without  loss  of  generality  that  a  narrow  strip  from 
Tk  vk  =  Tk^  lTkl  ^ree  the  an<*  consider  the  inte¬ 
gral 

2<k  =  (j)  {ln[(z-Tk)/(1-  rt  z)  J  j  d[ln  Ap(  z"1  )  J/2iri  (4.61) 

around  the  contour,  V  ,  depicted  in  Figure  5-  The  loga¬ 
rithm,  determined  so  that  ln(-1 )  =  "i,  is  analytic  within 
and  Ap(z-^)  has  neither  poles  nor  zeros  within  #  so 
=  °*  Aa  the  ra^ius  the  circular  portion  of  the 
contour,  *€  ,  surrounding  the  singularity  tends  to  zero  it 
offers  no  contribution  to  this  integral.  As  the  distance 
between  the  two  straight  sections  of  the  contour  tends  to 
zero  they  provide  the  contribution 


Jrz*Tk 

[  [Ap(z  ^)]  ^  d[Ap(z  ^)] 

9S  U 


=  In  Ap(Tk1)  -  m  Ap(»'k1) 


(4.62) 
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For  the  remaining  portion  of  the  contour,  integration  by 
parts  yields 


(!  -  \rt  )  T/k  Kk  -  S<k  +  Ar/2ni 


where  the  integrated  part  is 


Ar=  {In  Ap( z_1  )  ln[  (z-rk)/(1-Tf  z)  J  (  | 


=  v, 


=  Vv 


.(  +  ) 


=  In  Ap(  )  { 2iri  +  ln[  i^-Tk)/(1-  )  J } 


-In  ApU^1)  {ln[  (  »^-Tk)/(1  -  rt  »>k )  ] } 

=  2ni  In  Ap(  ) 

Substitution  of  ( 4 • 62 )  and  (4-64)  along  with  Sjk  = 
Equation  (4*63)  completes  the  proof. 

A  simple  variable  substitution  may  be  used  to 
the  related  formula 


Lkf 


=  I  In  Ap(z) }/{ (z-Tk)  (1-  Tf  z) }  dz/2iri 

=  {in  Ap(T-1  )}/(!-  rkTf  ) 


which  together  with  (4.60)  establishes  the 


(4.63; 


(4.64) 

0  into 

obtain 


(4.65) 
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Corollary  4-1 • 


rfk  f  Tkf 


i 


|ln  Ap(  z )  Ap(z_1  )  J/i  (z-rk)(1-  T|Z)(  dz/2ni 


=  {  In  Ap(  )  Ap(  Tf  ^  )  |  /  (  1  -  ) 


(4.66) 


Finally,  sufficient  background  has  now  been  presented 
to  establish  the 

Theorem  4-1 ♦  Let  W(6)  have  an  AR(M)  form  given  by 
W(0)  =  |«(  eie ) | 2  (4.67) 


where  n(z)  has  the  partial  fraction  expansion 
M 

n(z)  =  ( 1  _  Tt  Z_1 )  =  aw/BM^ 


(4.68) 


<=1 

with  |ij|<  1.  Then  with  g(9)  given  by  equation  (2.3)  the 
second  term  in  (4*57)  is 


/: 


w(e)  In  g(e)  de/2w  =  In  cr‘ 


/: 


w(e)  de/2ir  -  t 


(4.69) 


where 


M 

£  “k  °<Tk’>  m  Apt  Tk’ ) 


T  =  2 


(4.70) 


Proof .  Using  (2.3).  (4-67).  and  (4.68) 


T 


/> 


fi(ei9)|  2  ln|Ap(elw)|^  do/2ir 


i0\  2 


-V 

M 

"k"/  © 

C,l=l  Jp 


1  In  Apia)  Ap(z-1  )  j/j  (z-Tk)(1-Tfz)| 


dz/ 2iri 
(4.71  ) 


Together  with  the  above  corollary  this  yields 
M 

T  =  ^2  / ( 1  - Tk  )}  In  Ap(rjl)  Ap(T-l)  (4.72) 

k,l=l 

and  (upon  splitting  the  logarithm  and  collecting  terms) 
Equation  (4.70) . 

With  W(0)  =  1  this  theorem  yields 
I  In  g(0)  d0 / 2*  =  In  cr 2  (4-73) 

-IT 

which  is  a  special  case  of  Jensen's  theorem  [62,  Theorem 
9-2.5]*  The  first  term  in  Equation  (4-69)  is  easy  to  com¬ 
pute  while  the  second  term,  T,  given  by  Equation  (4.70)  may 
offer  the  reader  some  difficulty.  First  observe  that  (4-70) 
requires  knowledge  of  the  parameters  of  the  partial  fraction 
expansion  (4.68).  These  are  fairly  easy  to  determine  once 


the  roots  \  of  Bj,j(z)  are  known  by  recognizing  that 
equals^  ^ 

(z-Tk)  z“P  fl(z)  =  ^(z-Tk)/{zP  BM(z)j  (4-74) 


evaluated  at  z  =  rk.  Hence,  the  basic  difficulty  is  that  of 
determing  the  roots,  t^. 

Since  extracting  the  roots  of  ( z )  can  be  a  difficult 
problem  for  large  values  of  M  it  is  advantageous  if  B^Cz)  is 
already  known  as  a  product  of  low  order  factors.  To  ac¬ 
complish  this,  recall  that  B^z)  is  determined  so  that  W(0) 

A 

approximates  H(0).  If  W(0)  is  a  product  of  known  AR(2) 
models 


W( 0 )  =  W1 (0)  V2(0) 


WM/2(0) 


(4-75) 


then  Bm(z)  is  easily  known  as  a  product  of  second  order 
factors.  In  order  to  determine  W(0)  in  this  manner  one  may 
first  determine  W^(0)  to  approximate  H(0),  then  W2(0)  to 
approximate  H(6)/Wj(0),  then  W^(0)  to  approximate 
H(0)/[W^  (0)  W2(0)j  and  so  on.  To  obtain  the  best  overall 
approximation  it  is  probably  advantageous  to  develop  some 
simple  ad  hoc  method  to  force  the  approximation  at  each 


1 1 


This  assumes  the  roots,  Tk>  are  distinct.  The 
formulae  become  mildly  more  complicated  when  this  is  not  the 
case . 
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stage  to  fit  no  more  than  one  strong  resonance  in  tne 
function  being  approximated. 


Remarks 

This  chapter  has  explored  computational  procedures 
related  to  the  weighted  information  estimation  formulation 
developed  in  Chapter  III;  it  is  worth  noting  that  none  of 
these  methods  is  entirely  satisfactory  for  all  applications. 

The  first  section  employed  an  assumed  AR(M)  form  for 
the  weight  function  which  enabled  the  problem  to  be  cast  in 
the  form  of  a  nonlinear  system  of  polynomial  equations. 
Solution  of  the  system  was  found  to  be  a.  relatively  simple 
task  for  small  values  of  M  but  one  that  becomes  rapidly  more 
complex  as  M  is  increased  beyond  four.  As  a  general  ap¬ 
proach,  the  assumption  of  a  parametric  form  for  the  weight 
function  has  considerable  promise  for  the  development  of 
efficient  computational  methods;  the  basic  difficulty  is 
that  of  finding  a  clever  parametrization  which  provides 
sufficient  flexibility  in  the  form  of  the  weight  function 
(for  the  given  application)  while  leading  to  a  simple  and 
efficient  computational  algorithm. 

The  second  section  discussed  the  computation  of  various 
coefficients  that  arise  within  the  computational  formulae. 
Choice  of  a  specific  procedure  will  ultimately  be  influenced 
by  the  demands  of  the  specific  application;  interdepeniant 
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factors  to  be  considered  include  the  quantity  of  data  avail¬ 
able,  rounding/truncation  effects,  fixed/floating  point  rep¬ 
resentation  format,  algorithm  structure,  memory  require¬ 
ments,  and  computational  speed.  The  coefficient  evaluation 
procedures  discussed  are  variants  of  methods  proposed  \,and 
sometimes  implemented)  for  real  time  speech  analysis  appli¬ 
cations  . 

The  third  section  discussed  single-step  iterative 
methods  within  the  general  framework  provided  by  the  notion 
of  a  contraction  mapping.  Multi-step  methods  were  not  dis¬ 
cussed;  in  general,  convergence  characteristics  are  more 
difficult  to  prove  for  multi-step  methods  in  spite  of  the 
fact  that  they  tend  to  converge  faster  in  practice.  These 
iterative  methods  offer  significantly  more  flexibility  in 
the  form  of  the  weight  function^  at  the  expense  of  a 
greater  computational  cost.  The  notion  of  a  contraction 
map,  sometimes  employed  for  nonconstructive  existence 
proofs,  provides  a  useful  general  framework  within  which  a 


1  2 

Faster  convergence,  in  terms  of  a  reduced  number  of 
iterations,  should  not  be  confused  with  reduced  computa¬ 
tional  cost.  Each  iteration  of  a  multi-step  method  gener¬ 
ally  is  more  expensive  computationally  than  a  comparable 
single-step  method  so  that  a  detailed  analysis  is  usually 
required  to  compare  costs. 

^That  is,  compared  to  the  parametric  approach  to 
weight  function  selection  discussed  in  the  first  section. 
In  this  sense  one  might  describe  these  methods  with  a 
seemingly  contradictory  phrase  such  as  "nonparametr ic 
autoregressive  estimation". 
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variety  of  iterative  methods  may  be  discussed;  the  specific 
method  presented  is  a  modified  Newtonian  iteration  chosen 
as  a  tradeoff  between  simplicity  and  effectiveness.  A  pos¬ 
sibly  more  effective  iterative  procedure  would  be  a  steepest 
descent  method;  generally  such  a  procedure  attempts  to  mini¬ 
mize  a  scalar  function  U  =  U(v)  by  using  a  map  with  com¬ 
ponents 

*n  =  vn  -  A  au/9vn  (4-76) 

where  the  scalar  function  A  =  A(v)  is  chosen  to  minimize 
U(v)  at  each  iteration. 

The  fourth  section  considers  the  problem  of  minimizing 
Iw(Hf,g)  over  a  given  finite  collection  of  AR(P)  models. 
The  procedure  involves  the  computation  of  a  cost  function 
for  each  model  in  the  collection.  The  cost  function  in¬ 
volves  two  terms;  the  first  term  is  evaluated  quite  simply 
(regardless  of  the  form  of  the  weight  function)  using 
formula  (4- 58a)  which  is  identical  to  one  arising  in 
"standard"  (unweighted)  vector  quantization.  The  second 
term  is  usually  quite  simple  in  "standard"  vector  quanti¬ 
zation,  see  Equation  (4-73),  but  becomes  far  more  complex 
when  the  weighted  information  formulation  is  employed. 

An  extension  of  Jensen's  theorem  provides  a  formula 
which  may  be  employed  to  evaluate  this  term  when  W(e)  has  an 
AR(M)  form;  however,  the  reader  is  admonished  to  bear  in 
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mind  that  it  is  probably  far  simpler  to  discretize  this 
integral  and  evaluate  it  numerically  as  a  sum  of  products 
from 


W(e)  In  g(  0 )  de/2ir  = 


N-l 


k=-N 


*k 


(4.77) 


where 


*k  =  [m  g(\)\/(2K) 


(4-78) 


I 


This  has  the  additional  advantage  of  not  imposing  an  AR(M) 
form  upon  the  weight  function.  More  generally,  W(e)  might 
be  expressed  as  a  sum  of  perhaps  only  a  dozen  nonnegative 


"shape  functions"  by 


W(0)  =  ^  ^4^(6)  (4.79) 

k 

so  that,  if  the  quantities 


0)  In  g(0)  d0/2* 


(4-80) 


are  precomputed  for  each  AR  model  in  the  finite  collection, 
the  second  term  may  be  easily  evaluated  from 


i: 


W( 0 )  In  g(0)  d0/2w  = 


^k 

k 


(4.81  ) 
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CHAPTER  V 


RESULTS 

In  this  chapter  the  weighted  information  estimation 
formulation  is  demonstrated  to  provide  improved  performance 
relative  to  the  noise  filtering  formulation.  It  is  worth 
noting  that,  although  existence  has  not  been  proven  in  pre¬ 
vious  chapters,  several  thousand  data  frames  have  been  ana¬ 
lyzed  using  the  weighted  information  formulation  and  not  one 
counterexample  has  been  encountered. 


Gaussian  Signals 


In  order  to  study  the  performance  of  the  weighted  in¬ 
formation  formulation  pseudorandom  sequences  were  gener¬ 
ated.  A  zero-mean  white  Gaussian  process  was  simulated 
using  a  congruential  multiplicative  random  number  generator; 
the  resulting  sequence  of  independent  uniformly  distributed 
samples  was  transformed  to  Gaussian  form  using  the  Box- 
Miiller  transformation  followed  by  Central-Limit  aver¬ 
aging.1  Zero-mean  AR(P)  Gaussian  processes  were  simulated 


1 1n  theory,  the  Box-Muller  transformation  is  adequate. 
However,  if  the  input  deviates  from  a  uniform  distribution 
the  output  will,  correspondingly,  deviate  from  a  Gaussian 
distribution;  Central-Limit  averaging  will  tend  to  reduce 
any  such  deviations  from  a  Gaussian  form. 


by  applying  the  simulated  white  Gaussian  process  to  an  all¬ 
pole  .lattice  structure)  digital  filter;  the  first  few 
thousand  output  values  from  the  filter  were  consistently 
ignored  in  order  to  avoid  the  transient  response  of  the 
filter . 

By  adding  two  independent  zero-mean  Gaussian  AR  pro¬ 
cesses  at  a  specified  signal  to  noise  ratio  appropriate  test 
data  was  produced.  For  many  of  the  examples  the  "signal" 
process  had  an  AR(2)  spectrum  defined  by  the  reflection 
coefficient  values 

k1  =  -.8  and  =  --9  (5.1) 

This  signal  process  spectrum,  evaluated  from  these  parameter 
values,  is  displayed  in  Figure  6a.  While  some  examples 
employ  a  white  Gaussian  corrupting  noise  process,  others 
employ  an  AR(2)  process  defined  by  the  reflection  coef¬ 
ficient  values 

k1  =  +.8  and  k2  =  - • 9  (5*2) 

This  "colored"  noise  process  spectrum  is  displayed  in  Figure 
bb. 

As  a  basis  for  comparison,  the  standard  autocorrelation 
analysis  method  was  applied  to  100  different  400  sample 
Hamming  windowed  frames  of  data  from  the  uncorrupted  signal 
process.  Each  resulting  estimate  is  characterized  by  a  pair 


198 


of  reflection  coefficients  which  define  a  single  dot  in 
Figure  For  this  "scatter  plot"  (and  all  subsequent 
scatter  plots)  the  ordinate  and  the  abcissa  correspond  to 
the  first  and  second  reflection  coefficients,  respectively; 
for  convenience,  cross-hairs  indicate  the  location  of  the 
true  parameter  values. 

Figures  8  and  9  each  present  various  estimates  of  a 
single  200  sample  Hamming  windowed  frame  of  data-  in  both 
cases  the  frame  of  data  consists  of  the  signal  and  colored 
noise  processes  summed  at  a  10  dB  signal  to  noise  ratio. 
The  periodogram  estimates  in  Figures  Qa  and  ga  clearly  dis¬ 
play  the  signal  resonance  (near  the  fractional  frequency 
value  of  .8)  and  the  noise  resonance  (near  the  fractional 
frequency  value  of  .2). 

Figures  8b  and  9b  display  power  spectrum  estimates 
obtained  using  the  noise  filtering  formulation.  The  esti¬ 
mate  presented  in  Figure  8b  is  a  result  of  using  the  noise 
filter  response  displayed  in  Figure  8c  which  was  determined 

p 

by  using  the  power  subtraction  rule;  similarly,  Figure  9b 
results  from  the  use  of  the  noise  filter  response  displayed 
in  Figure  9c  which  was  determined  by  using  the  magnitude 
subtraction  rule. 


o 

As  indicated  in  the  caption,  the  noise  filter  response 
was  smoothed  across  frequencies  before  being  applied.  Al¬ 
though  many  smoothing  algorithms  are  possible,  only  a  re¬ 
cursive  median  smoother  (with  a  length  about  2.5#  of  the 
displayed  bandwidth)  was  ever  employed  to  obtain  results 
presented  in  this  chapter. 
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Figures  8d  and  9d  display  power  spectrum  estimates 
obtained  using  the  weighted  information  formulation.  The 
algebraic  solution  method,  which  requires  an  Ak(M)  form  for 
the  weight  function,  was  used  in  both  cases;  coefficient 
evaluation  was  performed  using  the  mixed  time-frequency 
domain  method  presented  in  Figure  4.  The  same  noise  filter 
response  functions  were  employed  and  the  weight  functions, 
displayed  in  Figures  8e  and  9e,  were  determined  as  an  AR(2) 
fit  to  their  respective  noise  filter  response  functions. 

By  comparing  Figures  8  and  9  to  the  true  signal  spec¬ 
trum  shown  in  Figure  6a  the  deficiencies  of  these  typical 
estimates  becomes  apparent.  In  Figure  8b  the  noise 
filtering  formulation  leads  to  an  estimate  which  is  overly 
flat;  the  weighted  information  formulation  (  Figure  8d)  has 
improved  the  estimate  by  raising  the  peak  and  lowering  the 
valleys.  In  Figure  9b  the  noise  filtering  formulation  leads 
to  an  estimate  which  is  overly  sharp;  the  weighted  infor¬ 
mation  formulation  (  Figure  9d)  has  improved  the  estimate  by 
lowering  the  peak  and  raising  the  valleys.  Since  the  weight 
functions  are  similar  in  both  figures  it  is  apparent  that 
frequency  weighting  cannot  be  simply  interpreted  as 
increasing  or  decreasing  the  sharpness  of  a  spectral  esti¬ 
mate;  rather,  the  weight  function  reduces  distortions  in  the 
estimate  by  requiring  a  more  accurate  fit  to  the  data  in 
those  spectral  regions  where  the  weight  function  is  large. 

Figures  10  and  11  present  the  result  of  analyzing  100 
different  400  sample  Hamming  windowed  frames  of  data  using 
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various  different  methods.  Figure  10  presents  the  results 
obtained  using  the  noise  filtering  formulation;  the  smoothed 
noise  filter  response  was  determined  using  different  rules 
ranging  (roughly)  from  the  least  severe  rule  in  :<M  10a 
to  the  most  severe  in  Fissure  10e.  The  results  presented  in 
Figure  11  represent  an  analysis  of  the  same  100  data  frames 
and  the  same  noise  filter  response  functions  but  the  ana¬ 
lysis  uses  the  weighted  information  formulation  with  an 
AR(2)  weight  function  fit  to  the  noise  filter  response 
function. 

It  is  clear  that  in  each  case  (a  through  e)  the  esti¬ 
mation  error  is  reduced  by  the  weighted  information  formu¬ 
lation.  The  best  results  in  both  figures  are  obtained  by 
the  most  severe  rules.  Figure  10,  while  exhibiting  less 
variance,  shows  an  increased  deviation  (bias)  of  the  main 
cluster  from  the  true  values  for  the  more  severe  rules; 
apparently,  variance  error  of  the  noise  filtering  formu¬ 
lation  may  be  reduced  at  the  expense  of  increased  bias  error 
by  using  the  more  severe  rules.  Comparing,  for  example, 
Figures  1 0e  and  lie  it  is  apparent  that  the  weighted  infor¬ 
mation  formulation  achieves  still  greater  variance  reduction 
while  correcting  the  bias  error.  Comparison  of  Figures  lie 
and  7  indicate  that  one  has  little,  if  any,  hope  of 
achieving  significantly  better  performance  than  that  pro¬ 
vided  by  the  weighted  information  formulation  in  this  case. 

Figures  12  and  13  show  similar  results  for  the  same  100 
frames  of  data;  the  analysis  methods  used  to  produce  these 
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figures  differ  from  the  method  used  to  produce  ;-'i  10 
and  11  only  in  that  no  smoothing  algorithm  was  applied  to 
the  noise  filter  response.  All  the  same  trends  are  apparent 
in  figu  res  12  and  13  as  were  apparent  in  "iguroo  10  and  11; 
somewhat  greater  variance  is  exhibited  in  these  figures 
indicating  that  smoothing  produces  a  generally  beneficial 
effect  in  this  case. 

Figures  14,  15,  16,  and  17  display  similar  results  for 
the  case  of  white  corrupting  noise  at  a  10  dB  signal  to 
noise  ratio.  Again,  each  plot  represents  analysis  of  the 
same  100  different  400  sample  Hamming  windowed  frames  of 
data.  For  each  method  of  determining  noise  filter  response, 
the  weighted  information  formulation  leads  to  less  variance 
and  bias  error  than  the  comparable  (unweighted)  noise 
filtering  formulation.  As  may  be  expected,^  all  these  esti¬ 
mators  yield  poorer  performance  in  this  white  noise  case 
than  in  the  previous  colored  noise  case. 

Figures  18,  19,  20,  and  21  again  present  similar  re¬ 
sults;  while  the  corrupting  noise  is  still  white  the  signal 
to  noise  ratio  is  now  zero  dB.  One  small  difference  is 
worth  noting:  in  Figures  10  through  17  the  parts  b,  c,  and 
d  employed  a  soft  suppression  rule  with  suppression  factors 


■^The  reader  will  recall  that  if  the  signal  and  noise 
processes  are  completely  separated  in  frequency  (i.e.,  do 
not  have  overlapping  spectra),  the  Wiener  filter  can  com¬ 
pletely  eliminate  the  noise.  Since  the  colored  noise  case 
exhibits  greater  spectral  separation  from  this  signal  pro¬ 
cess  than  the  white  noise  case,  an  estimate  can  be  expected 
to  provide  superior  performance. 
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Figure  18.  Noise  Filtering:  Figure  19*  Weighted  Infonna 

Gaussian  Signal  &  White  tion:  Gaussian  Signal  & 

Noise  (0  dB)  White  Noise  (0  dB) 


of  4,  6,  and  8  respectively;  in  Figures  18  through  21  the 
parts  b,  c,  and  d  again  employ  a  soft  suppression  rule  but 
with  increased  suppression  factors  of  6,  8,  and  10  re¬ 
spectively. 


Speech  and  Speech-Like  Signals 


Many  speech  waveforms  exhibit  a  nonrandom  periodic 
character;  their  spectra  display  a  fine  harmonic  structure 
(with  peaks  separated  by  integral  multiples  of  the  pitch 
frequency)  with  a  roughly  AR  modulation.  The  harmonic 
structure  is  generally  attributed  to  the  periodic  glottal 
pulses  while  the  AR  modulation  is  generally  attributed  to 
the  response  characteristics  of  the  vocal  tract. 

To  simulate  such  waveforms  the  all  pole  filter  with 
frequency  response  displayed  in  f.'gure  6a  was  excited  with  a 
periodic  stream  of  impulses  (with  a  period  of  79  samples). 
No  figure  comparable  to  Figure  7  is  included  here  since,  in 
the  absence  of  noise,  the  analysis  of  100  different  400 
sample  Hamming  windowed  frames  of  data  (with  a  random  dis¬ 
tribution  of  phase  displacement)  presents  no  apparent  esti¬ 
mation  error.4  Consequently,  while  part  of  the  apparent 
estimation  error  in  the  scatter  plots  of  Figures  10  through 
21  must  be  attributed  to  the  random  character  of  the  signal 


4That  is,  on  the  scale  used  for  these  scatter  plots. 
On  a  greatly  enlarged  scale,  a  small  amount  of  bias  and 
variance  error  may  be  observed. 


itself,  all  of  the  apparent  estimation  error  in  the  fol¬ 
lowing  scatter  plots  (Figures  24  through  55'  may  be  attri¬ 
buted  to  the  presence  of  noise. 

Figures  22  and  23  each  present  various  estimates  of  a 
single  200  sample  Hamming  windowed  frame  of  data.  Ln  both 
cases  the  frame  of  data  consists  of  the  aforementioned  peri¬ 
odic  signal  process  and  a  colored  Gaussian  noise  process 
summed  at  a  10  dB  signal  to  noise  ratio.  The  periodogram 
estimates  in  Figures  22a  and  23a  clearly  display  the  fine 
harmonic  structure  of  the  signal  spectrum  near  the  filter 
resonance  (fractional  frequency  of  .8)  while  this  structure 
breaks  down  near  the  noise  resonance  (fractional  frequency 
of  .2). 

Figures  22b  and  23b  display  estimates  obtained  using 
the  noise  filtering  formulation;  Figures  22c  and  23c  display 
the  noise  filter  response  characteristics  that  produced 
these  estimates.  Clearly  the  estimate  appearing  in  Figure 
22b  is  overly  flat  while  the  estimate  appearing  in  Figure 
23b  is  overly  sharp.  Figures  22d  and  23d  display  the 
estimates  obtained  using  the  weighted  information 
formulation;  comparison  with  Figure  6a  reveals  that  both 
these  estimates  are  improved  relative  to  their  counterparts 
in  Figures  22b  and  23b.  Finally  the  AR(2)  weight  functions 
approximating  the  noise  filter  response  functions  are 
presented  in  Figures  22e  and  23e. 

Figures  24,  25,  26,  and  27  display  a  variety  of  scatter 
plots;  each  scatter  plot  presents  the  result  of  analyzing 
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100  different  400  sample  Hamming  windowed  frames  of  data; 
the  same  100  data  frames  were  employed  for  each  plot.  As 
mentioned  above,  because  the  signal  process  is  periodic  and 
not  random  in  character  all  of  the  apparent  estimation  error 
can  be  attributed  to  tne  added  colored  Gaussian  noise 
(SNR  =  10  dB). 

Figures  24  and  25  employ  smoothed  noise  filter  response 
characteristics  while  Figures  26  and  27  employ  the  un¬ 
smoothed  characteristics. ^  Figures  24  and  26  display  the 
results  obtained  with  the  noise  filtering  formulation  while 
Figures  25  and  27  display  the  results  obtained  with  the 
AR(2 )  weighted  information  formulation.  Once  again,  th 
weighted  information  formulation  leads  to  less  estimation 
error  than  the  comparable  noise  filtering  formulation;  in 
Figures  25d  and  25e  the  estimation  error  is  so  small  as  to 
be  almost  imperceptible  on  the  scale  employed  for  these 
plots.  Smoothing  still  appears  to  display  a  generally 
beneficial  effect. 

Figures  28,  29>  50,  and  51  present  similar  results  for 
the  case  of  white  Gaussian  noise  corruption  to  the  periodic 
signal  processes  (SNR  =  10  dB) .  As  with  the  Gaussian  signal 


5 

■'Some  caution  is  advised  regarding  the  use  of  smoothers 
here.  The  dimensions  of  the  lobes  within  the  fine  harmonic 
structure  are  controlled  by  the  length  and  shape  of  the  data 
window  so  that  a  smoother  that  works  well  with  one  frame 
length  may  not  work  well  with  longer  frames  or  a  differently 
shaped  window. 


process,  all  the  estimates  present  degraded  performance  in 
this  white  noise  case  as  compared  to  the  colored  noise  case. 

To  complete  these  simulations,  Figur<-r.  32,  33,  34,  and 
35  present  analysis  results  for  the  case  of  white  Gaussian 
noise  corruption  to  the  periodic  signal  process  at  a  zero  dB 
signal  to  noise  ratiio.  As  with  the  Gaussian  signal  pro¬ 
cess,  parts  b,  c,  and  d  of  these  figures  employ  soft  sup¬ 
pression  rules  with  increased  suppression  factors  of  6, 
8, and  10  respectively. 

The  following  summarizes  the  description  of  these  scat¬ 
ter  plots.  Figures  10-13  and  24-27  correspond  to  colored 
noise  corruption  at  a  10  dB  signal  to  noise  ratio;  Figures 
14-17  and  28-31  correspond  to  white  noise  corruption  at  a  1 0 
dB  signal  to  noise  ratio;  Figures  18-21  and  32-35  correspond 
to  white  noise  corruption  at  a  0  dB  signal  to  noise  ratio. 
Figures  10-21  correspond  to  a  Gaussian  random  signal; 
Figures  24-35  correspond  to  a  periodic  (period  =  79  samples) 
signal.  Figures  10,  11,  14,  15,  18,  19,  24,  25,  28,  29,  32, 
and  33  employ  a  smoothed  noise  filter  response  while  the 
remainder  employ  an  unsmoothed  response;  parts  a  and  e  of 
each  of  these  figures  determine  the  noise  filter  response 
using  the  power  and  magnitude  subtraction  rules  respectively 
while  parts  b,  c,  and  d  employ  the  soft  suppression  rules. 
In  Figures  10-17  and  24-31  the  suppression  factors  for  parts 
b,  c,  and  d  are  4,  6,  and  8  respectively;  in  Figures  18-21 
and  32-35  the  suppression  factors  and  6,  8,  and  10  re¬ 
spectively.  Finally,  Figures  10,  12,  14,  16,  18,  20,  24, 
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26,  28,  30,  3 2  and  34  display  the  results  of  tne  (un¬ 


weighted)  noise  filtering  analysis  while  Fipun  11,  13,  13, 
17,  19,  21,  25,  27,  29,  31,  33,  and  35  display  the  results 
of  the  AR (2)  weighted  information  analysis. 

Before  concluding  this  chapter,  several  exampLes  re¬ 
sulting  from  the  analysis  of  a  real  speech  segment  are  pro¬ 
vided.  Figure  36a  shows  a  periodogram  estimate  obtained 
from  a  Hamming  windowed  400  sample  segment  taken  from  the 
vowel  portion  of  the  word  "wrap";^  from  the  fine  harmonic 
structure  it  is  apparent  that  the  pitch  of  this  segment  is 
about  135  Hz  (about  59  samples).  Figure  36b  shows  a  tenth 
order  AR  estimate  of  the  spectrum  obtained  as  the  result  of 
an  autocorrelation  method  analysis  of  the  same  Hamming 
windowed  segment;  four  vocal  tract  resonances  are  clearly 
visible . ^ 

Figures  37a  and  37b  show  periodogram  and  tenth  order  AR 
estimates  obtained  from  this  same  vowel  segment  after  adding 
white  noise  at  a  10  dB  signal  to  noise  ratio.  Clearly,  the 
fine  harmonic  structure  of  the  periodogram  estimate  has  been 
partially  obscured  and,  while  four  resonances  are  still 
visible,  the  AR  estimate  is  severely  distorted. 


^The  word,  spoken  in  context  by  an  adult  male  in  a 
quiet  environment,  was  taken  from  the  sentence  "Don't  gift 
wrap  the  tall  glass."  and  was  appropriately  filtered  before 
sampling  at  8  kHz. 

^Lower  and  higher  order  analyses  were  applied  to  this 
segment  and  it  was  judged  from  plots  such  as  these  that  a 
tenth  order  model  is  appropriate. 
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(a)  Log  Power  Spectrum  (dB  vs  Fractional  Frequency) ;  Periodogram 
Estimate  of  Vowel  +  White  Noise  Spectrum;  SNR  =  10  dB 


Figure  37.  Vowel  Spectrum  in  White  Noise 


Figures  38  and  39  display  the  result  of  applying  vari¬ 
ous  other  estimators  to  the  same  white  noise  corrupted  data 
frame.  Figure  38  shows  results  obtained  with  the  smoothed 
power  subtraction  rule  and  figure  39  shows  results  obtained 
with  the  smoothed  magnitude  subtraction  rule.  Part  a  of 
each  figure  shows  the  result  obtained  with  the  noise  fil¬ 
tering  formulation;  the  noise  filter  response  functions  are 
displayed  in  part  b.  The  weighted  information  estimates, 
displayed  in  part  c,  were  obtained  using  the  modified  Newton 
iteration  described  in  Chapter  IV;  the  weight  functions, 
displayed  in  part  d,  were  selected  as  an  AR(b)  fit  to  the 

Q 

noise  filter  response  functions  displayed  in  part  b. 

Comparison  of  figures  38a  and  39a  to  figure  36b  reveals 
the  deficiencies  of  these  noise  filtered  estimates;  in  par¬ 
ticular,  the  reader  should  note  the  amplitude  of  the  third 
and  fourth  (highest  frequency)  resonance  peaks  as  well  as 
the  depth  of  the  valleys  near  the  fractional  frequency 
values  of  zero  and  one.  These  features  are  partially  cor¬ 
rected  in  figures  38c  and  39c  by  the  weighted  information 
formulation;  most  notable  i3  the  correction  of  the  valley 
depth  aear  the  fractional  frequency  value  of  zero.  AI30 
worth  noting  is  the  improved  valley  depth  near  the  frac¬ 
tional  frequency  of  one  in  figure  38c  and  the  improved 
amplitude  of  the  fourth  resonance  peak  in  figure  39c. 


O 

The  weight  functions  need  not  be  selected  to  have  an 
AR  form;  however,  with  this  iterative  method,  convergence  is 
more  difficult  to  achieve  with  more  complex  weight  function 
forms . 
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CHAPTER  VI 


CONCLUSION 


A  new  method  of  spectral  estimation  has  been  presented. 
The  method  addresses  the  problem  of  noise  corruption  to  the 
time  series  measurements  and  assumes  knowledge  of  the  noise 
power  spectral  density. ^  The  method  has  been  demonstrated 
to  yield  superior  performance,  in  terms  of  reduced  esti¬ 
mation  error,  and  has  been  suggested  for  use  in  speech 
analysis  applications. 

Although  the  Gaussian  assumption  is  invoked  for  the 

theoretical  development  of  the  method,  examples  have  been 

provided  that  show  the  method  yields  superior  performance 

for  other  signals  as  well.  Similarly,  the  method  is 

considered  to  be  fairly  robust  with  respect  to  the  other 
o 

assumptions.  It  is  worth  noting  that  while  the  AR  signal 
model  has  been  assumed  throughout,  this  assumption  is  by  no 
means  necessary  to  the  theoretical  development  so  that 


'Actually,  only  knowledge  of  the  frequency  response  of 
a  filter  designed  to  eliminate  the  noise  is  assumed.  Know¬ 
ledge  of  the  noise  power  spectral  density  merely  leads  to 
one  common  method  of  designing  such  a  filter. 

^A  possible  exception  is  the  assumption  of  independence 
between  the  signal  and  noise  processes  for  it  is  this  as¬ 
sumption  that  leads  to  the  model  of  additive  signal  and 
noise  power  spectral  densities. 


other  le.g.  ARMA,  Pisarenko,  etc.)  models  may  also  be  con¬ 
sidered  .  ^ 

Computational  procedures  relevant  to  the  problem  of  AR 
model  estimation  (using  the  weighted  information  formu¬ 
lation)  have  been  explored.  An  algebraic  method,  applicable 
when  the  weight  function  assumes  an  AR(M)  form,  has  been 
discussed;  when  M  _<_  4,  this  method  will  obtain  the  solution 
using  an  algorithm  of  reasonable  complexity  for  many  appli¬ 
cations.  Iterative  techniques  have  been  discussed  that 
obtain  the  solution  while  permitting  an  extremely  flexible 
class  of  weight  functions;  the  price  of  this  greater  flexi¬ 
bility  is  a  considerable  increase  in  complexity  as  well  as 
the  need  for  much  user  interaction.  Several  methods  of 
coefficient  evaluation  were  presented;  one  was  implemented 
and  used  to  obtain  the  simulation  results. 

The  problem  of  AR  model  detection  (vector  quantization) 
requires  the  evaluation  of  two  integrals  for  each  model  in 
the  finite  collection.  Evaluation  of  the  first  integral  is 
accomplished  by  Equation  (4. 58a);  this  equation  requires  the 
same  number  of  additions,  multiplications,  and  (read-only) 
storage  locations  as  is  required  by  the  usual  (unweighted) 


^The  new  formulation  would  still  require  minimization 
of  Iu(Hf,g)  and  the  analogy  leading  to  Equation  (3.20)  still 
applies.  The  only  difference  is  in  the  selection  of  a  para¬ 
metric  signal  model  and  the  system  of  equations  that  fol¬ 
lows.  Uniqueness  questions  would  need  to  be  addressed 
separately  but  one  may  hope  to  find  that  similar  convexity 
arguments  would  apply.  Of  course,  the  computational  pro¬ 
cedures  discussed  earlier  may  no  longer  be  appropriate. 
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metnods  of  vector  quantization.  Tne  second  integral  is 
evaluated  as  a  constant  {independent  of  the  data  but  de¬ 
pending  upon  the  model)  by  the  usual  (unweighted,)  methods  of 
vector  quantization;  Equation  (4*31)  is  advocated  for  evalu¬ 
ation  of  tne  second  integral  with  the  weighted  information 
formulation.  With  about  a  dozen  terms,  as  suggested  for 
speech  analysis  applications,  evaluation  of  the  second  inte¬ 
gral  using  Equation  (4.81)  is  about  equivalent  in  complexity 
to  evaluation  of  the  first  integral. 


Suggestions  for  Future  Research 


There  are  numerous  ways  to  extend  and  refine  the  ideas 
and  methods  presented  here.  The  following  suggestions, 
offered  in  no  particular  order,  are  thought  to  be  worth¬ 
while  . 


Extension  to  other  spectral  models.  As  mentioned 
earlier,  the  AR  model  form  is  not  necessary;  moreover, 
for  some  applications  it  may  not  even  be  appropriate. 


Assuming  an  AR  model,  determine  the  conditions  for  (and 
a  proof  of)  existence.  Empirical  evidence  for  ex¬ 
istence  is  strong;  it  is  thought  that  the  conditions 
are  quite  mild  from  a  practical  viewpoint  (e.g.  that 
the  weight  function  is  bounded).  While  the  question  of 
existence  is  mostly  of  theoretical  interest  by  itself; 
the  methods  used  to  prove  existence  (and  the  precise 
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conditions  for  existence)  should  have  practical  value. 
Per  example,  a  proof  Dased  upon  a  contraction  map  is 
liKely  to  yield  a  highly  effective  iterative  solution 
procedure  as  well. 

•Further  investigation  of  methods  of  coefficient  evalu¬ 
ation.  These  should  be  studied  in  close  relation  to 

the  specific  application  in  order  to  select  a  design 
offering  a  reasonable  tradeoff  between  computational 
effort  and  performance. 

•  Investigation  of  numerical  methods  for  solution  of  the 
ideal  formulation.  It  is  thought  that  the  ideal  formu¬ 
lation  should  yield  still  better  performance,  particu¬ 
larly  at  very  low  signal  to  noise  ratios;  it  is 
expected  that  these  methods  will  be  very  compu¬ 
tationally  expensive. 

•Development  of  related  formulations  assuming  a  cor¬ 
related  noise  model.  The  cross-spectrum  (between  the 
signal  and  noise  processes)  may  be  known,  say,  as  a 
function  of  the  unknown  signal  model  spectrum  and  the 
known  noise  spectrum  in  some  applications;  this  may 
occur,  for  example,  if  additive  independent  signal  and 
noise  processes  were  passed  through  a  known  nonlinear 
system  prior  to  observation. 


•  Further  investigation  of  computational  methods  ap¬ 

propriate  for  the  AR  weight  function  model;  investi¬ 
gation  of  computational  methods  appropriate  for  otner 
parametric  weight  function  models.  While  the  unique¬ 
ness  result  guarantees  that  only  one  product  model, 
Cp+M(z),  satisfying  equations  (4.13)  and  (4.15)  has  all 
its  "additional"  reflection  coefficients  |kp+^,  kp+p* 
...,  kp+^}  inside  the  interval  (-1,1)  it  is  not  known 
if  the  other  product  models  satisfying  these  equations 
have  all  their  "additional"  reflection  coefficients 
outside  this  interval  (of  course,  they  must  have  at 
least  some  of  their  "additional"  reflection 

coefficients  outside  this  interval);  if  this  were  true, 
the  development  of  an  efficient  algorithm  for  higher 
order  AR  weight  function  models  would  be  greatly 
facilitated.  In  general,  it  is  considered  that 

parametric  weight  function  models  provide  the  greatest 
hope  for  procedures  yielding  a  flexible  choice  of 
weight  function  together  with  an  efficient  solution 
algorithm. 

•  Investigation  of  the  appropriate  selection  of  "shape 

functions"  in  connection  with  use  of  the  weighted  in¬ 
formation  formulation  for  vector  quantization,  see 
equations  (4-79),  (4.80),  and  (4.81).  For  speech 

analysis  applications,  it  is  expected  that  each  shape 


function  as  the  power  spectral  response  function  of  a 
bandpass  filter  with  response  characteristics  similar 
to  those  filters  found  in  "channel  vocoder"  systems. 

•  Performance  evaluation  in  specific  [speech  analysis  and 
other)  applications  using  (global)  measures  appropriate 
to  the  particular  application.  In  a  voice  communi¬ 
cations  system  an  appropriate  measure  may  be  the  result 
of  some  formal  subjective  listening  test.  In  a  recog¬ 
nition  system  the  recognition  error  rate  may  be  an 
appropriate  measure.  Systems  that  predict  stock  market 
activity  might  measure  overall  investment  performance. 

•Extension  of  the  formulation  to  problems  of  multi¬ 
dimensional  spectral  estimation. 

* 

•Use  of  the  basic  concepts/ideas  of  the  weighted  infor¬ 
mation  formulation  to  develop  a  procedure  treating  the 
issues  of  limited  data  and  noise  corruption  simultane¬ 
ously,  perhaps  in  combination  with  notions  of  Kalman 
filtering  and  the  Burg  algorithm. 


233 


CITED  REFERENCES 


Robinson,  E.A.,  "A  Historical  Perspective  of  Spectrum 
Estimation,"  IEEE  Proc.,  Special  Issue  on  Spectral 
Estimation,  70,  9,  September,  1982. 

Schuster,  A.,  "On  the  Investigation  of  Hidden  Perio¬ 
dicities  with  Application  to  a  Supposed  26-Day 
Period  of  Meterological  Phenomena,"  Terr.  Magnet., 
3,  1898. 

Yule,  G.U.,  "On  a  Method  of  Investigating  Periodicities 
in  Disturbed  Series,  with  Special  Reference  to 
Wolfer's  Sunspot  Numbers,"  Phil.  Trans.  Roy.  Soc. 
London,  Series  A,  226,  1927- 

Walker,  G. ,  "On  Periodicity  in  Series  of  Related 
Terms,"  Proc.  Roy.  Soc.  London,  Series  A,  131, 
1931. 

Blackman,  R.B.  and  Tukey,  J.W.,  The  Measurement  of 
Power  Spectra  Prom  the  Point  of~7iew  of  Communica¬ 
tions  Engineering,  reprint  of  the  two  1958  arti¬ 
cles  ( 8.  S.  T.  J. ,  37),  New  York:  Dover,  1959* 

Fourier,  Jean  Baptiste  Joseph,  The  Analytical  Theory  of 
Heat,  reprint  of  the  18,78  translation  from  his 
celebrated  treatise  (Theorie  Analytique  de  la 
Chaleur .  Paris:  DidoTI  1822. )  elaborating  his 

controversial  1807  manuscript,  New  York:  Dover, 
1  955. 

Pisarenko,  V.F.,  "The  Retrieval  of  Harmonics  from  a 
Covariance  Function,"  Geophys.  J.  Roy.  Astron. 
Soc.,  33,  1973- 

Jaynes,  E.T.,  "Information  Theory  and  Statistical  Me¬ 
chanics  -  Part  I",  Phys.  Rev.,  106,  1957. 

_ f  "Information  Theory  and  Statistical 

Mechanics  -  Part  II",  Phys.  Rev.,  108,  1957. 

_ t  "On  the  Rationale  of  Maximum-Entropy 

Methods,"  IEEE  Proc.,  Special  Issue  on  Spectral 
Estimation,  70,  9,  September,  1982. 


-  '  V  "U 1 


■PT-T7 


r»i’7 


V  "l  •••!  *.l » J.  ■."»  ■» 


P 

fc" 

I 


I 


k 


i 

» * 


k 


C 

s.’ 

V* 


'■'.I'.I'J  1  • 


11.  Burg,  J.P.,  "Maximum  Entropy  Spectral  Analysis,"  re¬ 

print  from  the  Proc.  37th  meeting  of  the  Soc.  of 
Exploration  Geophys.  (1967),  in  Modern  Spectrum 
Analysis ,  Childers,  D.G.,  Ed.,  Hew  York :  IEEe 

Press ,  T978. 

12.  Steinnardt,  A.O.,  "An  Optimization  Theoretic  Framework 

for  Spectral  Estimation,"  Ph.D.  Thesis,  U.  of 
Colorado,  1983* 

13*  Shore,  J.E.,  and  Johnson,  R.W.,  "Axiomatic  Derivation 
of  the  Principle  of  Maximum  Entropy  and  the  Prin¬ 
ciple  of  Minimum  Cross-Entropy,"  IEEE  Trans,  on 
IT,  26,  1,  January,  1980. 

14*  Kullback,  3.,  Information  Theory  and  Statistics,  New 
York:  Wiley"J  1 959* 

15«  Shore,  J.E.,  and  Johnson,  R.W.,  "Properties  of  Cross- 
Entropy  Minimization,"  IEEE  Trans,  on  IT,  27,  4, 
July,  1982. 

16.  Jeffreys,  H.  ,  "An  Invariant  Form  for  the  Prior  Proba¬ 

bility  in  Estimation  Problems,"  Proc.  Roy.  See., 
A186,  1946. 

17.  Shore,  J.E.,  "Minimum  Cross-Entropy  Spectral  Analysis," 

IEEE  Trans,  on  ASSP,  29,  2,  April,  1981. 

18.  Akaike,  H. ,  "A  New  Look  at  the  Statistical  Model  Iden¬ 

tification,"  reprint  from  IEEE  Trans,  on  AC,  19, 
1974,  in  Modern  Spectrum  Analysis,  Childers,  D.G., 
Ed . ,  New  York:  IEEE  Press,  1978. 

19*  Parzen,  E. ,  "Some  Recent  Advances  in  Time  Series 
Modeling,"  reprint  from  IEEE  Trans,  on  AC,  19, 
1974  in  Modern  Spectrum  Analysis,  Childers,  D.G., 
Ed.,  New  York:  IEEE  tress,  1 978. 

20.  _ ,  "Time  Series  Model  Identification  by 

Estimating  Information,"  Texas  A&M  Research 
Foundation,  Tech.  Rept.  N-35,  November,  1982. 

21.  Markel,  J.D.,  and  Gray,  A.H.,  Jr.,  Linear  Prediction  of 

Speech,  New  York:  Springer-Verlag,  1 976 . 

22.  Makhoul,  J.,  "Stable  and  Efficient  Lattice  Methods  for 

Linear  Prediction,"  reprinted  from  IEEE  Trans,  on 
ASSP,  25,  1977  in  Modern  Spectrum  Analysis,  Chil¬ 
ders,  D.G.,  Ed.,  New  York:  IEEfe  tress,  1^78. 

23*  Burg,  J.P.,  "A  New  Analysis  Technique  for  Time  Series 
Data,"  reprinted  from  a  1968  presentation  at  the 
NATO  Advanced  Study  Institute  on  Signal  Processing 


235 


w 


Ed.  , 


in  Modern  Spectrum  Analysis,  Childers,  D.G., 

New  York:  lE£E  Press ,  f  9781 

24.  Itakura,  F.  and  Saito,  3.,  "Analysis  Synthesis  Tele¬ 
phony  Based  Upon  the  Maximum  Likelinood  Method," 
Reports  of  the  Sixth  Int.  Cong,  on  Acoust.,  Y. 
Kohasi,  Ed.,  Tokyo,  1968. 

29*  Robinson,  E.A.,  and  Treitel,  S.  ,  Geophysical  Signal 
Analysis ,  Englewood  Cliffs:  Prentice-Hall ,  1 9o0. 

26.  Kay,  8.M.,  "Recursive  Maximum  Likelihood  Estimation  of 
Autoregressive  Processes,"  IEEE  Trans,  on  A3SP, 
31,  1,  February,  1 983 • 

27-  Pinsker,  M.S.,  Information  and  Information  Stability  of 
Random  Variables  and  Processes,  translated  from 
the  original  published  In  Moscow  (I960),  San 
Francisco:  Holden  Day,  1964. 

28.  Parzen,  E. ,  "Time  Series  Analysis  for  Models  of  Signals 
Plus  White  Noise,"  in  Spectral  Analysis  of  Time 
Series,  Harris,  B. ,  Ed.,  New  York:  Wiley,  1966". 

29*  Pagano,  M. ,  "Estimation  of  Models  of  Autoregressive 
Signal  Plus  White  Noise,"  Ann.  of  Stat.,  2,  1, 

1974. 

30.  Wiener,  N. ,  Extrapolation,  Interpolation,  and  Smoothing 

of  Stationary  Time  Series,  New  York:  Wiley,  1949* 

31.  Kalman,  R.E.,  "A  New  Approach  to  Linear  Filtering  and 

Prediction  Problems,"  Trans,  of  ASME,  J.  of  Basic 
Eng.,  March,  I960. 

32.  Lim,  J.S.,  and  Oppenheim,  A.V.,  "Enhancement  and  Band¬ 

width  Compression  of  Noisy  Speech,"  reprinted  from 
Proc.  IEEE,  67,  12,  1979  in  Speech  Enhancement , 

Lim,  J.S.,  Ed.,  Englewood  Cliffs:  Prentice-Hall , 
1983- 

33*  Preuss,  R.D.,  "A  Frequency  Domain  Noise  Cancelling 
Preprocessor  for  Narrowband  Speech  Communication 
Systems,"  Proc.  IEEE  Int.  Conf.  on  Acoustics, 
Speech,  and  Signal  Processing,  April,  1979* 

34*  Berouti,  M. ,  Schwartz,  R. ,  and  Makhoul,  J.,  "Enhance¬ 
ment  of  Speech  Corrupted  by  Acoustic  Noise,"  re¬ 
printed  from  Proc.  IEEE  Int.  Conf.  on  Acoustics, 
Speech,  and  Signal  Processing,  April,  1979  in 
Speech  Enhancement,  Lim,  J.S.,  Ed.,  Englewood 
Cliffs:  Prentice-Hall,  1983* 


236 


■Boll,  3.F.,  "Suppression  of  Acoustic  Noise  in  Speech 
Using  Spectral  Subtraction,"  reprinted  from  IEEE 
Trans,  on  ASSP,  27,  2,  1979  in  Speech  Enhancement, 
Lim,  J.S.,  Ed.,  Englewood  Cliffs!  Prentice-Ral  1 , 
1983- 

McAulay,  R.J.  and  Malpass,  M.L.,  "Speech  Enhancement 
Using  a  Soft-Decision  Noise  Suppression  Filter," 
reprinted  from  IEEE  Trans,  on  ASSP,  28,  2,  1930  in 
Speech  Enhancement,  Lim,  J.S.,  Ed.,  Englewood 
Cliffs.  Prentice-Hall,  1933* 

Widrow,  B. ,  et  al . ,  "Adaptive  Noise  Cancelling:  Prin¬ 
ciples  and  Applications,"  reprinted  from  Proc. 
IEEE,  63,  12,  1975  in  Speech  Enhancement,  Lim, 

J.S.,  Ed.,  Englewood  Cliffs:  Prentice-Hall,  1983* 

Sambur,  M.R.,  "Adaptive  Noise  Cancelling  for  Speech 
Signals,"  IEEE  Trans,  on  ASSP,  26,  5,  1978. 

Boll,  S.F.,  and  Pulsipher,  D.C.,  "Suppression  of 
Acoustic  Noise  in  Speech  Using  Two  Microphone 
Adaptive  Noise  Cancellation,"  reprinted  from  IEEE 
Trans,  on  ASSP,  28,  6,  1980  in  Speech  Enhancement, 
Lim,  J.S.,  Ed.,  Englewood  Cliffs:  Prentice-Hall, 
1983- 

Rife,  D.C.  and  Boorstyn,  R.R.,  "Multiple  tone  parameter 
estimation  from  discrete  time  observations,"  Bell 
Syst.  Tech.  J. ,  55,  9,  1976. 

Tufts,  D.W.  and  Kumaresan,  R.  ,  "Estimation  of  Fre¬ 
quencies  of  Multiple  Sinusoids:  Making  Linear 

Prediction  Perform  Like  Maximum  Likelihood,"  Proc. 
IEEE,  Special  Issue  on  Spectral  Estimation,  70,  9, 
September,  1982. 

Cadzow,  J.A.,  Baseghi,  B. ,  and  Hsu,  T. ,  "Singular-value 
decomposition  approach  to  time  series  modelling," 
IEE  Proc.,  Part  F. ,  Special  issue  on  Spectral 
Analysis,  130,  3,  1983- 

Cramer,  H.,  Mathematical  Methods  of  Statistics, 
Princeton:  Princeton  University  Press,  1 946 . 

Kaveh,  M.,  and  Bruzzone,  S.P.,  "Statistical  efficiency 
of  correlation-based  methods  for  ARMA  spectral 
estimation,"  IEE  Proc.,  Part  F.  ,  Special  issue  on 
Spectral  Analysis,  130,  3,  1983* 


Quirk,  M.  and  Liu,  B. ,  "Improving  Resolution  For  Auto¬ 
regressive  Spectral  Estimation  By  Decimation," 
IEEE  Trans,  on  ASSP,  31,  5,  June,  1983* 


Nehorai,  A.  and  Morf,  M.  ,  "Enhancement  of  3inusoids  in 
Colored  Noise  and  Whitening  Performance  of  Exact 
Least  Squares  Predictors,"  IEEE  Trans,  on  ASSP, 
30,  3,  June,  1982. 

Wong,  D.Y.,  Juang,  B.H.,  and  Gray,  A.H.,  "An  800  bit/s 
Vector  Quantization  LPC  Vocoder,"  IEEE  Trans,  on 
ASSP,  30,  5,  1982. 

Abut,  H. ,  Gray,  R.M.,  and  Rebolledo,  G. ,  "Vector 
Quantization  of  Speech  and  Speech-Like  Waveforms," 
IEEE  Trans,  on  ASSP,  30,  3,  1982. 

Gray,  R.M.,  Gray,  A.H.,  Rebolledo,  G.,  and  Shore,  J.E., 
"Rate-distortion  speech  coding  with  a  minimum 
discrimination  information  distortion  measure," 
IEEE  Trans,  on  IT,  27,  6,  1981. 

Buzo,  A.,  Gray,  A.H.,  Gray,  R.M.,  and  Markel,  J.D., 
"Speech  Coding  Based  Upon  Vector  Quantization," 
IEEE  Trans,  on  ASSP,  28,  5,  1980. 

Linde,  Y. ,  Buzo,  A.,  and  Gray,  R.M.,  "An  Algorithm  for 
Vector  Quantizer  Design,"  IEEE  Trans,  on  COM,  28, 
1,  1980. 

Buzo  de  la  Pena,  L.A.,  "Optimal  Vector  Quantization  for 
Linear  Predictive  Coded  Speech,"  Ph.D.  thesis, 
Stanford  University,  1978. 

Suresh  Babu,  B.N.,  Volz,  B.E.,  Washburn,  S.J.,  and 
Preuss,  R.D. ,  "Software  for  Vector  Quantization," 
MTR-841 4,  Bedford,  Mass.:  The  MITRE  Corp.,  1981. 

Clapp,  R.A.J.,  "Acoustic  Segments  in  Natural  Speech: 
Analysis  and  Statistics,"  MTR-8405,  Bedford, 
Mass."  The  MITRE  Corp.,  1981. 

Chu,  P.L.,  "Frequency  Weighted  Linear  Predictive  Coding 
of  Speech,"  Ph.D.  Thesis,  University  of  Calif., 
Berkeley,  July,  1981. 

_ ,  and  Messerschmitt,  D.G.,  "A  Frequency 

Weighted  Itakura-Saito  Spectral  Distance  Measure," 
IEEE  Trans.  On  ASSP,  30,  4,  1982. 

Levinson,  N.,  "The  Wiener  RMS  Error  Criteria  in  Filter 

Design  and  Prediction,"  J.  Math.  Phys.,  25, 
pp.  261-78,  19*17 - 

Durbin,  J.,  "The  Fitting  of  Time-Series  Models,"  Rev. 

Inst.  Int.  Statist.,  28,  3,  pp.  233-43,  I960. 


Burnside,  W.S.  and  Panton,  A.W.,  The  Theory  of 

Equations ,  7  th  Ed .  ,  London:  L.  n^rnsns ,  Green,  t 
ljl2.  Also  available  as  a  reprint,  Nr  Y  or.-:: 

Dover,  I960 

Uspensky,  J.V.,  Theory  of  Equations,  New  York:  McGraw- 

Hill,  19^8. 

Collatz,  L. ,  Functional  Analysis  and  Numerical 

Mathematics  ,  translated  from  the-  h: r.  by  ''sop,  U . 
New  York:  Acauem: ?  Press,  ]Qod. 

Hille ,  E.,  Analytic  Function  Theory,  Vol.  1,  Boston: 

Ginn  and  Company,  1959. 


PUBLICATION  ACTIVITIES 


In  this  part,  a  list  of  papers  that  are  published 
and/or  to  be  published  is  given  along  with  a  brief  summary 
for  each  paper. 

List  of  papers  and  their  Summaries 

1.  L.  D.  Hoy,  D.  L.  Soldan,  and  R.  Yarlagadda,  "An  Adaptive 
Approach  to  Narrowband  Linear  Predictive  Coding  of 
Speech",  proc.  of  the  Fifteenth  Annual  Asilomar 

Conference  on  Circuits,  Systems  and  Computers,  Pacific 
Grove  CA,  pp.  331-334,  1981. 

Summary 

Linear  predictive  coding  is  an  efficient 
narrowband  coding  technique  for  speech  signals  but 
degrades  significantly  in  the  presence  of  noise.  This 
paper  examines  a  prefilter  consisting  of  an  adaptive 
digital  predictor  with  pitch  period  delay.  Preliminary 
results  indicating  the  performance  of  two  adaptive 
algorithms  are  presented.  It  is  shown  that  the  ADP  can 
improve  speech  signal  quality,  as  measured  by  signal- 
to-noise  ratios,  when  the  speech  is  corrupted  by 
wideband  noise.  The  performance  sensitivity  to  pitch 
period  errors  is  also  examined. 


2.  D.  L.  Soldan  and  L.  D.  Hoy,  "Pitch  Extraction  with  an 
Adaptive  Filter",  MIDCON/82,  Professional  Program  Session 
Record  3/2,  Dallas,  Texas,  1982. 

Summary 

This  paper  presents  a  technique  of  estimating  the 
fundamental  frequency  or  "pitch"  of  a  voiced  speech 
signal  that  is  based  on  a  tapped  delay  line  adaptive 
digital  filter  (TDLADF).  This  method  allows  better 
resolution  of  the  pitch  frequency  than  traditional 
techniques  such  as  autocorrelation  and  harmonic 
analysis.  It  also  appears  to  have  better  noise 
tolerance  than  these  techniques.  Advances  in  VLSI 
design  should  allow  real-time  processing  using  the 
TDLADF  in  the  future. 

3-  L.  Hoy,  B.  Burns,  D.  Soldan  and  R.  Yarlagadda,  "Noise 
Suppression  Methods  for  Speech  Applications",  Proc.  of 
the  1933  Int.  Conf.  on  Acoustics,  Speech  and  Signal 
Processing,  pp.  1133-1136,  Boston,  Mass.,  1983- 

Summary 

Thi3  paper  presents  a  discussion  and  evaluation  of 
several  filtering  techniques  for  suppressing  narrowband 
background  noise  in  speech  signals.  The  methods 
discussed  are  a  modified  spectral  subtraction 


242 


technique,  an  inverse  transform  filter,  an  adaptive 
notch  placement  technique,  an  adaptive  predictor,  and  a 
modification  of  the  adaptive  predictor.  Performance  of 
the  filter  methods  are  compared  using  a  spectral  error 
measurement  and  an  area  ratio  parameter  error 
measurement.  Although  the  modified  adaptive  predictor 
provided  the  best  improvement  in  spectral  error, 
results  indicate  the  modified  spectral  subtraction 
method  to  be  the  most  suitable  for  use  with  linear 
predictive  coding  systems. 

R.  D.  Preuss  and  R.  Yarlagadda,  "Autoregressive  Spectral 
Estimation  in  Noise  in  the  Context  of  Speech  Analysis", 
to  be  presented  at  the  Second  ASSP  Workshop  on  Spectral 
Estimation,  Tampa,  Florida,  1983- 

Summary 

An  improved  method  of  spectral  estimation  is 
described.  The  method  treats  the  problem  of  estimating 
autoregressive  (AR)  process  parameters  from  sequential 
discrete  time  observations  corrupted  by  additive 
independent  noise  with  known  power  spectral  density. 
The  method  has  a  theoretical  foundation  relating  it  to 
principles  of  information  theory  as  well  as  the  linear 
predictive  (LP)  procedures  popularly  employed  for 
speech  analysis.  Simulation  results  are  used  to 


support  the  theoretical  development  and  demonstrate  the 


advantages  of  the  method  as  compared  to  currently 
popular  methods  of  estimating  speech  spectra  from  noise 
corrupted  observations. 


C.  3.  Sims  and  R.  Yarlagadda,  "Design  of  Fast  Recursive 
Estimators",  to  be  published  as  a  full  paper  in  the  IEEE 
Trans,  on  Acoustics,  Speech  and  Signal  Processing. 


Summar: 


A  recursive  linear  estimator  is  proposed  for  rapid 
estimation  of  a  signal  in  noise.  Efficient  methods  are 
developed  for  optimization  of  the  filter 


coefficients. 


Optimal  selection  of  data  to  be 


processed  i3  shown  to  be  related  to  a  classic  integer 
programming  problem. 


6.  R.  Yarlagadda  and  C.  S.  Sims,  "A  Rote  on  the  General 
Discrete-Time  Linear  Estimation",  submitted  for 
publication. 


Summary 


This  paper  presents  a  simple  and  an  efficient 
algorithm  for  the  solution  of  a  generalized  least 
squares  prediction  problem.  The  derivations  are 


presented  in  terms  of  matrix  point  of  view. 


MISSION 
of 

Rome  Air  Development  Center 

RAVC  plans  and  execu tes  research,  development,  test  and 
selected  acquisition  programs  -in  Support  of  Commend,  Control 
Communications  and  Intelligence  (C5I )  activities.  Technical 
and  engineering  support  uiithin  areas  of  technical  competence 
-ls  provided  to  ESV  Program  Calces  (PCs!  and  other  ESV 
elements.  The  principal  technical  mission  areas  are 
communications ,  electromagnetic  guidance  and  control,  sur- 
ve^llance  of  ground  and  aerospace  objects,  intelligence  data 
collection  and  handling,  information  system  technology , 
solid  state  sciences,  electromagnetics  and  electronic 
reliability,  maintainability  ana  compatibility. 


Printed  by 

United  States  Air  Force 
Honscom  AFB,  Mass.  01731 


£&  X0X0X0X0, 


11-85 


DTIC 


